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1.0   Introduction 

The  purpose  of  this  study  was  to  identify  and  analyze  mathematical 
algorithms  for  possible  hardware  implementation.   Emphasis  was  placed  on 
the  evaluation  of  elementary  functions,  such  as  square  root,  logarithm  and 
exponential,  trigonometric  functions,  and  multiply  and  divide.   The  study 
was  directed  toward  algorithms  for  binary  computers,  although  some  references 
are  included  which  address  themselves  to  radices  10  [56],  16  [18],  and  -2 
[53],  [54],  [65]. 

We  have  not  assumed  that  any  fixed  degree  of  accuracy  was  required. 
Rather,  we  have  generally  concentrated  on  methods  which  are  flexible  enough 
for  the  accuracy  to  be  a  function  of  the  number  of  iterations  performed. 
We  have  generally  assumed  that  extremely  low  precision  is  not  sufficient, 
thus  ruling  out  consideration  of  methods  which  are  essentially  table-look-up, 
with  or  without  interpolation  (e.g.,  [23],  [52],  [57]).   Low  precision  inter- 
polation, such  as  Mitchell  [43],  and  variations  of  it  ([11],  [27],  [39]) 
have  not  been  studied. 

The  commitment  of  a  large  amount  of  hardware  can  sometimes  be  used  to 
decrease  the  execution  time  for  evaluation  of  functions.   The  use  of  cellular 
arrays  has  been  proposed  for  multiplication  [13],  division  [37],  [62],  [3], 
square  root  [9],  [22],  [38],  and  logarithm  [15].   While  this  idea  is  promising 
in  terms  of  speed,  we  have  concentrated  on  algorithms  to  be  implemented  with 
the  use  of  only  a  moderate  amount  of  parallel  processing. 

A  number  of  the  proposed  algorithms  are  unified  in  the  sense  that  with 
variations  in  certain  parameters  the  same  procedure  can  be  used  to  evaluate 
any  one  of  several  functions.   A  description  of  three  algorithms  of  this  type 
will  be  given  in  Section  3,  before  proceeding  with  a  discussion  of  algorithms 
for  specific  functions  in  Section  4.   A  short  consideration  of  error  analysis 
is  given  in  Section  2. 


2.0  Error  Analysis  Considerations 

The  error  involved  in  evaluating  a  function  consists  of  three  parts: 
(i)   Roundoff  error  is  accumulated  in  doing  the  necessary  arithmetic;  (ii) 
The  methods  are  interative,  and  convergence  is  obtained  only  to  within  a 
specified  tolerance,  thus  truncation  error  occurs;  and  (iii)   If  the  inter- 
mediate results  are  carried  to  additional  bits  of  accuracy  to  reduce  roundoff 
error  accumulation,  an  error  is  committed  by  reducing  the  number  of  bits  in 
the  final  result  to  machine  precision. 

We  will  generally  assume  that  we  are  dealing  with  fractional  numbers 
which  involve  N  fraction  bits.   This  is  compatible  with  the  notion  of 
floating  point  arithmetic,  where  the  value  of  the  exponent,  or  characteristic, 
is  taken  care  of  by  a  separate  normalization  procedure.  This  may  be  performed 
before  or  after  the  algorithm  we  discuss.   More  is  said  about  this  in 
discussing  the  various  functions. 

Suppose  that  M  operations  where  roundoff  error  may  occur  are  performed. 
If  the  error  committed  each  time  is  bounded  by  6,  the  total  error  can  not 

exceed  Me.   Assuming  N  fraction  bits,  we  would  probably  want  the  roundoff 

-N-l  -N-l 

error  to  be  bounded  by  2    ,  or  Me  <  2    .   Thus  -  log.e  >  N  +  1  +  log_M. 

Roundoff  error  is  decreased  by  increasing  the  precision  of  the  intermediate 

results;  say,  use  J  "guard"  bits  for  a  total  of  N  +  J  fraction  bits.   Then 

(1) 
where 

Thus,    (1)    in  the  above,  yields      -   log2K  +N+J+1>N+1+  log^l, 


*The  term  chopping  will  refer  to  simply  truncating  after  N  +  J  bits  without 
regard  to  the  N  +  J  +  U&  bit.   The  term  truncation  error  will  be  defined  later, 
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or 


(2)      J  > 


log  M    for  rounding 
log?M  +  1  for  chopping  . 


The  truncation  error  in  these  algorithms  occurs  when  a  function  g(x)  is 
to  be  evaluated  at  a  point  a,  and  actual  evaluation  point  is  a  -  y.   This 
error  is  the  result  of  truncating  an  infinite  process  after  a  finite  number 
of  steps.   If   g   is  differentiable,  which  it  is  in  our  case,  the  difference 
in  the  function  values  is  g(a)  -  g(a  -  y)  =  g'  (a*)y,  where  a*  is  between 

a  and  a   -  y.      The  number  y  is  related  to  the  convergence  criterion  used  in 

-N-l 
the  algorithm,  and  in  most  cases  it  will  be  bounded  by  2     ,  the  same  as  the 

roundoff  error  bound  we  will  assume. 

The  choice  of  bound  for  truncation  error  and  roundoff  error  should  be 

undertaken  together,  since  it  does  not  make  good  sense  to  choose  either  so 

that  the  truncation  or  roundoff  error  bounds  differ  significantly  from  each 

other.   That  is,  it  would  not  be  meaningful  to  do  enough  iterations  in  a 

-20 
calculation  to  make  the  truncation  error  as  small  as  2    if  the  roundoff 

error  could  be  as  big  as  2    .   Conversely,  it  is  wasteful  to  hold  roundoff 

error  to  2    if  one  is  trying  to  obtain  accuracy  to  2 

In  line  with  the  above  proposed  selection  of  J,  the  truncation  error 

-N-l 
bound  should  be  made  about  2     also.   The  total  error  could  then  be  as 

-N 
large  as  2   ,  in  the  calculated  value.   It  is  then  necessary  to  chop  or 

round  this  result  to  N  bits,  which  introduces  an  additional  error  of  at  most 

_N     -N-l 
2   or  2     ,  respectively.   The  final  value  could  then  be  in  error  by  as 

-N         -N 
much  as  2*2   or  3/2*2   .   These  are  bounds ,  and  in  the  usual  case  the  error 

will  be  smaller.   One  should  keep  in  mind  that  they  are  sharp  bounds,  in 

the  sense  they  may  be  approached  closely  in  a  given  case. 
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It  appears  that  it  would  certainly  be  worthwhile  to  round  to  N  bits 
after  the  final  iteration,  since  this  procedure  gives  a  significantly  smaller 
total  error  bound  than  that  for  chopping.   If  the  error  bound  given  previously 
for  the  results  of  the  iterations  (before  the  final  round)  are  not  sufficiently 

accurate,  the  bound  on  this  part  of  the  calculation  can  be  reduced  to  2   from 

-N 
2     by  the  performance  of  one  additional  iteration,  using  one  more  guard  bit. 

-N-l 
The  total  error  bound  can  never  be  made  smaller  than  2     ,  since  that  is  the 

bound  for  the  error  in  the  final  rounding  operation. 

The  error  bounds  given  in  Section  4  will  be  for  the  N  +  J  bit  result, 

before  the  final  round  to  N  bits,  the  added  error  bound  for  the  final  rounding 

being  the  same  for  all  cases  where  J  >  0. 

3.0  Unified  Algorithms 

Algorithms  which,  with  small  modifications,  can  evaluate  one  of  several 
functions  are  basically  of  three  types.   One  is  formulated  as  a  coordinate 
rotation  problem,  another  is  formulated  as  a  "pseudo-division/multiplication" 
process,  while  a  third  type  is  a  normalization  procedure.   We  will  discuss 
each  of  them  in  this  section. 

3.1  Coordinate  Rotation  Methods 

The  Coordinate  Rotation  Digital  Computer  was  first  discussed  by  Voider 
[64],  who  considered  rotations  in  the  usual  circular  coordinate  system.   It 
was  indicated  that  work  was  also  done  in  hyperbolic  and  linear  coordinate 
systems,  but  this  was  not  reported  in  detail.   An  earlier  report  by  Voider 
[63],  was  not  available.   Liccardo  [31]  did  a  master's  thesis  on  the  CORDIC 
methods,  and  included  the  hyperbolic  system.   He  also  outlined  procedures 
for  multiply  and  divide.   Linhardt  and  Miller  [34]  included  the  details  of 
the  hyperbolic  system,  but  not  the  linear  system.   Walther  [66]  presented 
the  unified  algorithm  and  his  paper  tied  together  the  rotations  in  the  three 
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different  coordinate  systems.   A  paper  by  Perle  [47]  also  appeared,  which 

included  a  method  of  obtaining  sin   x.   The  algorithm  for  sin  x  does  not 

fit  the  pattern,  in  that  it  does  not  generalize  to  hyperbolic  and  linear 

systems.   Very  recently,  Schmid  and  Bogacki  [56]  discuss  the  implementation 

of  the  CORDIC  algorithm  in  radix  10. 

Basically,  the  CORDIC  method  involves  taking  a  sequence  of  rotations 

(with  radial  distortion)  of  an  initially  rotated  coordinate  system.   The 

initial  angle  of  rotation  is   z   =  z.   A  point  (x  ,y  )  =  (x,y)  is  specified 

by  giving  its  coordinates  with  respect  to  the  rotated  coordinate  system. 

We  generate  the  following  sequence  of  points  by  the  rotations;   (x  ,y  ) , 

(x,y   ),..., (x    ,y   ) ,   where 
11  n     n 

x.in    =   x.    +  ms.6.y. 
l+l  l  ill 

(3)  y. ...    =  y .    -  6.s.x. 

J l+l        J i  ill 

Here  m  is  parameter  indicating  the  type  of  coordinates  for  the  rotations 

(1,  0,  -1  for  circular,  linear,  hyperbolic,  respectively),   s.  =  ±  1  determines 

the  direction  of  rotation,  and  the   6.   are  specified  constants.   The  angles 

z.   of  the  rotated  coordinate  system,  and  radii  R.   of  the  radius  vectors  to 
1  J  x 

the   (x.,y.)   are  given  by 


z  .  , ,  =  z  .  +  s  .a  . 
l+l    i    li 


2.* 


R#l1  =  R.(l  +  m6.")2,  where 
l+l    l       l 

(4)  a.  =  m  "  tan   (m  6.),  and 

l  l 

2     2  \ 
R  =  (x   +  y   ) 
o     o     o 
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z   is  specified,  as  indicated  previously.   We  see  that  each  (x  ,Y.)  is  the 
o  i  i 

location  of  the  (radially  distorted)  point  (x  ,y  ),  with  respect  to  a 

o  o 

coordinate  system  rotated  through  angle  z..   Also  note  that 

z  =  z  +  a 
n    o 

R  =  R  K   ,  where 
n    o  m 


(5) 


n-1 

a  =  /  s . a .  ,  and 
,L      11 


i=o 


n-1         2  u 

K  -  n   (1  +  mS.    )2 
m  i=o        l 


are  independent  of  the  (x.,y.),  except  insofar  as  they  may  influence  the   s  , 

In  terms  of  the  initial  values  x  and  y  ,  it  can  be  shown  that 

o      o 


h  h  h 

x  =  K  {x   cos  (am  )  +  y  m  sin  (am  ) } 
n    m  o  o 


(6) 


h, 


-h 


y  =  K  {y  cos  (am  )  -  x  m   sin(am2)} 
n    m  o  o 


It  is  necessary  that  the  initial  values  of  x  y   and  z  be  restricted, 

o  o       o 

both  for  purposes  of  representing  them  in  the  computer,  and  to  guarantee 

convergence  of  the  algorithm.  We  say  more  about  this  later.  With  the 

appropriate  restrictions  on  x  y   and  z   and  judicious  choice  of  the  s . , 
rr  r  o  o       o      J  i 

the  values  of  x  }y  y    and  z   can  be  forced  to  approach  that  values  indicated 
n  n       n 

in  Table  1.   The  terms  rotation  mode  (s.'s  chosen  to  force  z   to  zero)  and 

l  n 

vectoring  mode  (s.'s  chosen  to  force  y   to  zero)  were  coined  by  Voider  and 
l  n 

are  descriptive.   The  choice  of  s.  is  as  follows.   In  rotation  mode 

l 


s.  = 

l 


if   z  .  <  0 

i 


-1   if   z.  >0. 

l 
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In  the  vectoring  mode 

1   if   y.  >  0 


Si 


-1    if    y   <  0 


The  latter  is  dependent  on  x.  being  non-negative.   This  is  easily  adjusted 

for  in  cases  where  x   is  negative,  as  we  will  note  in  the  pertinent  sections. 

The  choice  of  6 .  is  critical  to  the  entire  procedure  and  in  order  to 
l  r 

facilitate  computation,  we  want  each  6.  to  be  a  power  of  two.   Thus  the 

transformations  (in  a  binary  computer)  are  accomplished  by  shifting  and 

adding.   In  order  that  the  radial  distortion  constant,  K  ,  be  independent 

m 

of  the  input  data,  it  is  necessary  that  the  magnitudes  of  the  6.  be  indepen- 
dent of  the  input  data.   Thus  the  same  sequence  of  rotations  must  always  be  done, 
except  that  the  direction  of  rotation  may  vary.   Walther  gives  the  choice  of  6. 

given  in  Table  2.   We  also  list  the  maximum  value  of  a  obtainable  with  this 

sequence,  as  well  as  the  corresponding  value  of  K  and  the  number  of  iterations, 

-N-l 
N  ,  required  so  that  a„,   ~  2 
m  N 

m 

We  will  see  that  with  appropriate  range  reduction  techniques,  discussed 
for  the  individual  functions  in  the  next  section,  that  the  sequences  given 
in  Table  2  are  adequate. 

Because  the  last  rotation  of  magnitude  cl,  must  be  accomplished,  the 

m 

value  of  z     can  be  as  large  as  a„  ,  in  the  rotation  mode,  and  the  value 
N  +1  N 

m  m 

of  y    ..  can  be  as  large  as  x   6    in  the  vectoring  mode.   The  truncation 

m  mm 

error  for  the  CORDIC  algorithm  is  based  on  a„  ,  then. 

6  N 

m 
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m 


lim  x 
n-*»   n 


K..  (x  cos  z 


Rotation  mode 


lim  y 
n-*»  n 


lim  z 
n-x»  n 


y  sin  z)   K  (y  cos  z  +  x  sin  z) 


x 


y  +  x*z 


-1      K   (x  cosh  z  +  y  sinh  z)  K   (y  cosh  z  +  x  sinh  z) 


Vectoring  mode 


Kx(x2  +  y2)h 


z  +  tan  y/x 


z  +  y/x 


-1      K  , (x2  -  y2)^ 


z  +  tanh  y/x 


Table  1 


m  p.  sequence  (6.  =  *  2  i 

1  0,1,2, ...,  n 

0  1,2,3, ...,  n 

•1  1,2,3,4,4,*. .. ,  n 


Max  a 
-1.74 

1.0 
-1.13 


K 
m 

-1.65 
1.0 
-.80 


N 
m 

N  +  2 
N  +  1 
N  +  1  +  R* 


*Repeated  values  are  4,13,...,k,  3k  +  1,  R  denotes  the  number  of  repeated 
values,  e.g.,  if  N  =  16,  R  =  2. 

Table  2 
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-J*  -1     i< 

The  values  of   K     and  a     =  m    l  tan      (in  6),    i  =   1,2,...,   N    ,   must  be 
mi  I  m 

available,  and  as  with  most  algorithms  in  this  class,  it  is  assumed  these 
will  be  stored  in  a  read-only-memory.   Note  that  in  practice,  for  the 
algorithm  to  operate  for  all  three  values  of  m,  values  of  the  a  for  all 
three  m,  (tan   2,2,  and  tanh  2   )  would  need  to  be  stored. 

The  greatest  asset  of  the  CORDIC  algorithm  is  its  versatility.   It  is, 
in  fact,  even  more  versatile  than  would  appear  at  first  glance,  since  functions 
related  to  those  of  Table  1  can  be  evaluated  by  pre  -  or  post  -  manipulation 
of  the  data.   For  example,  •w"  may  be  obtained  by  setting  x  =  w  +  ^,  Y   =  ^   -  h, 
and  entering  the  algorithm  in  the  vectoring  mode  with  m  =  -  1,  followed  by 
a  division  by  K   . 

The  main  disadvantage  of  the  method  is  that  it  requires  a  fixed  number 

of  iterations  (rotations),  whether  needed  or  not,  to  avoid  changing  the 

constant  K  .   DeLugish  [16]  gives  a  partial  solution  to  the  problem,  which 
m 

we  will  discuss  later.   Also,  the  CORDIC  algorithm  could  be  used  in  part, 
as  we  note  later,  in  Section  4.3.3,  such  that  the  radial  distortion  constant 
is  eliminated. 

3.1.1  Error  Reduction  for  Certain  Functions 

For  functions  which  are  zero  when  the  argument  is  zero,  special  pro- 
cedures must  be  used  to  obtain  results  which  are  accurate  to  N  significant 
digits.   Generally  the  normalization  procedures  handle  this  automatically, 
but  we  will  discuss  briefly  some  functions  where  this  is  not  the  case. 

When  the  sine,  tangent,  or  arctangent  of  a  small  argument,  in  floating 
point  form,  is  desired,  we  would  like  the  result  to  be  accurate  to  as  many 
significant  digits  as  possible.   The  procedures  outlined  in  Section  4  do  not 
result  in  this  desired  accuracy.   We  will  note  in  Section  4  that  all  the 
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algorithms  for  these  functions  reduce  to  the  CORDIC  algorithm,  essentially, 
and  Walther  [66]  has  treated  this  problem.   The  solution  is  to  scale  the 
argument  by  an  appropriate  power  of  two,  beginning  the  rotations  with  an 
appropriately  small  angle  and  continuing  for  the  usual  number  of  rotations. 
This  requires  the  inclusion  of  a  set  of  distortion  constants,  depending  on 
the  initial  angle  of  rotation,  as  well  as  additional  values  for  the  a.. 
The  details  of  the  implementation  can  be  found  in  Walther.   The  above  should 
also  be  done  for  the  hyperbolic  system  if  the  hyperbolic  sine,  tangent,  and 
arctangent  are  desired.   If  the  hyperbolic  system  is  to  be  used  to  calculate 
the  exponential,  logorithm,  and  square  root,  the  increased  accuracy  there 
is  more  apparent  than  real,  since  the  initial  or  final  transformation  negates 
the  increased  accuracy. 

3.2  Pseudo-Division/Multiplication  Methods 

Consider  the  division  of  y  by  x  using  the  method  of  successive  sub- 
tractions.  This  can  be  organized  (in  binary)  as  follows.   Let  y  and  x  be 

normalized  so  that  \  <   x,y  <  1.   Then  h  <   v/x  <  2.   Let  the  quotient  be 

n     _ . 
represented  by  the  number  Q  . =£  q.2   ,  where  each  q.  is  0  or  1.   The  q. 

i=0 
are  determined  by  the  following  loop.   Let  D  =y,d  =x,Q  =0. 

For  i  =  0,1,...,  n,  let 


qi  = 


Then  D    =  2(D  -  q^) 

dt+i  '  di 


r  0   if   D.  >  d.  ,  or 
l    l 


1   if   D.  <  d^  , 
l    i 
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n-1 

For  multiplication,  the  procedure  is  reversed.   Let  x  =  7   x  2  '  .   Then 

i 
i=0 

the  product  xv  =  P   is  generated  by  P   =  0,  y   =  y,  and  for 
i  =  0,1,..  .  ,  n  -  1, 

<Vi  "  2(pi  +  xiV 
yi+i  ■  yi 

The  product  is  assumed  to  be  accumulated  in  a  register   of  length  2n+l  bits. 

At  the  end  of  the  multiplication  the  product  has  been  shifted  n+1  bits, 

into  the  most  significant  n  bits  of  the  register,  and  chopping  (or  rounding) 

to  the  most  significant  n  bits  gives  the  result.   Error  occurs  only  when 

the  least  significant  half  is  chopped  or  rounded  off. 

A  pseudo-division  or  multiplication  is  a  procedure  like  one  of  the  above, 

where  the  q.'s  or  x.'s  are  not  necessarily  the  coefficients  in  radix  2,  and 
11 

the  divisior  d.,  or  multiplicand  y.,  may  be  modified  at  each  iteration. 

Meggitt  [40]  devised  a  class  of  these  methods  for  division/multiplication, 
logarithm/exponential,  tangent/arctangent,  and  square  root.   While  Meggitt' s 
paper  was  developed  for  radix  10,  the  conversion  to  any  other  radix  is  simple. 

Meggitt' s  algorithms  correspond   to  a  restoring  division.   The  test  for 

D.  h   d.  would  be  done  by  computing  a  tentative  value  of  h   D .  .  n  ,  h   D.... 
11  i+1      l+l 

D.  -  d.,  then  testing  h   D._/,   for  sign.   If  %   D-+-i   *s  non-negative,  q.  =  1, 

and  D...  =  2(h   D.  £?').   If  %  T>.^      is  negative,  q.  =  0  and  D...  = 
l+l        l+l  l+l       °       ni  l+l 

2(h   D.;,   +  d.).   That  is,  we  must  "restore"  the  value  of  D.  in  the  latter 
l+l     l  l 

case. 

It  is  not  necessary  to  restore  the  value  of  D . ,  however,  since  it  is 
possible  to  compute  ^  D-+9  directly  from  D.+1  •  Suppose  q.  =0,  then  we 
have 
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h  Di«  -  Dl+1  -  di+l  - 2  Di  -  dl+l 


a.   „  (t) 


"  2ft  D  t+l  +  di>  "  di+l 


"  Dm  +  dl  "  (dl+l  -  di>  • 


Depending  on  (d    -  d.),  this  expression  may  or  may  not  be  easier  to  compute 

than  by  first  restoring  D..   In  normal  division,  d#11  =  d,,  so  in  that  case 

1  l+l    i 

it  is  difinitely  easier  to  use  the  above,  non-restoring  division.   We  then 
compute  the  sequence  D.    rather  than  D.,  with 

h   D.^}  =  Dft}  -  d.  if  q.  =  1,  or 
l+l     l      li 


^  D.J,   =  D.   +  d.  if  q.  =  0,  and  that  q .  is  1  or  0  as  D;   is 
l+l     ill  i  i 

non-negative  or  negative,  respectively. 

Sarkar  and  Krishnamurthy  [55]  modified  Meggitt's  algorithms  to 

correspond  to  a  non-restoring  pseudo-division/multiplication.   This  results 
in  a  faster  algorithm.   It  would  appear  this  is  more  advantageous  in  radix 
10  than  in  radix  2.   The  above  mentioned  paper  incorporated  a  possibility 
of  restoration  or  non-restoration,  depending  on  which  appeared  to  be  more 
advantageous,  i.e.,  which  left  the  smaller  dividend. 

We  list  in  Tables  3  and  4  the  initialization  and  iteration  equations 
for  Meggitt's  algorithms  in  radix  2.   In  cases  where  it  is  assumed  a  number 
is  expressed  in  a  variable  radix  (log(l  +2   )  or  tan   2   ),  this  must  be 
first  obtained  by  a  modified  division.   The  procedure  given  by  Meggitt  is 
as  follows,  where  D  is  to  be  recoded  in  one  of  the  above  radices. 


D  =  D 
o 
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For  i  =  0,1, ... ,  n  -  1 


d±   =  21  log(l  +  2  X)  ,  [or  :  21  tan  1   2   ±] 


q±-       < 


1   if    D.  >  d, 
i    1 


0   If    D.  <  dJ 
l    i 


D±+i =  2(Di "  VP 


At  the  end  of  the  loop,  we  have 


n-1  _.         n-1 

D  =  I     q   log(l  +  2  X)  ,  [or:   £  q   tan  X  2  X]  , 
i=0  X  1=0  x 


-1  ~-i 


with  remainder  D  /2  . 
n 

The  advantage  of  the  pseudo-division/multiplication  processes  is  that 
they  look  very  much  like  multiplication  and  division,  and  could  be  imple- 
mented with  little  additional  hardware.   The  routines  are  inherently  accu- 
rate and  insensitive  to  roundoff  error,  and  accuracy  comparable  to  other 
methods  can  be  obtained  with  only  one  guard  bit.   The  use  of  one  double 
length  register  is  necessary,  although  with  some  reformulation  this  might 
be  replaced  by  one  having  only  a  sufficient  number  of  guard  bits,  about 
log2n+l. 

The  evaluation  of  some  functions  require  a  modified  division  to  recode 
an  argument,  followed  by  a  psuedo-multiplication,  or  a  pseudo-division, 
followed  by  a  modified  multiplication,  and  as  outlined,  some  of  these 
must  be  done  serially,  not  in  parallel. 


-13- 


X 


X 


s 

■u 


I- 


O 


X 


1 

•H 

CN 

X 

1 

.H 

/—N 

+ 

•H 

•H 

X 

1 

•H 

■H 

CN 

1 

a' 
1 

•H 

cr 

CN 

•H 

•H 

X 

cr 

>> 

X 

CN 

» ' 

CM 

X 

•H 
CN 

1 

d 

•H 

1 

CN 
tH 

1 

•H 

•H 

d 

X 

CN 

CO 

•H 

1 

■u 

cr 

l 

CN 

•H 

•H 

•H 

cr 

cr 

>. 

X 

CN 

X 

1 

d 

•H 
1 
CN 

+ 
t-H 

/— N 

•H 

^— ' 

■H 

X 

00 

X 

•H 

o 

•H 

1 

H 

CT1 
1 

CN 

•H 

•H 

cr 

cr 

>, 

X 

CN 
^^ 
CN 

•H 

X 

•H 
CT 
1 

•H 

i 

•H 
X 

1 

d 

1 

CN 

•H 

cr 

>, 

X 

CN 
CN 

•H 

X 

rH 
1 

d 

Wl 

<u 

> 

•H 

4-> 

CJ 

0) 

o. 

CO 

CD 

d 

U 

o 

•H 

ft 

CO 

•H 

•H 

X 

> 

•H 

V 

■ 

•H 

i 

O 

N 

T3 

3 

J-i 

0) 

O 

CO 

eu 

•H 

X 

4-1 

4-1 

Al 

•H 

00 

•H 

00 

CN 

0) 

2 

CO 

d 

•  • 

tH 

CO 

)-i 

CD 

o 

>H 

43 

o 

CO 

H 

CD 

4-1 

o 

2 


CN 


T3 

4-J 

d 

u 

d 

CD 

o 

CD 

T3 

CO 

•H 

•H 

•H 

4J 

>         rH 

> 

tH 

o 

•H        + 

•H 

+ 

d 

T3       -H 

-d 

•H 

cr 

1     CN 

1 

X 

I 

O 

o 

o 

T3 

X) 

T) 

3 

d 

d 

0) 

a) 

a) 

CO 

CO 

CO 

PL, 

Pm 

Ph 

-14- 


CM 


a 

CO 
H 


+ 


+ 


•H 
CN 


W 


•H 
I 
CN 


c 

CO 


w: 


CN 

+ 


60 
O 


W) 


•H 

I 
CN 

W 


1 
•H 

1 

CN 

X 

+ 

rH 

/ - 

+ 

•H 

•H 

X 

1 

•H 

CN 

cr 

•r 

+ 

O* 

•H 

X 

co 

1 

CM 

X 

n 

CO 

u 

■U 
'H 

CO 


cr 

CN 
CN 


X 

•H 

+ 

•H 
CN 


CN 


cr 

+ 

•i- 

CN 


CN 

i-H 


cr 

+ 


CN 


cr 

•H 

CM 

1 

e 

CN 

o 

1 

•H 

•H 

4J 

X 

CO 

a 

•H 

o 

OJ 
CO 

CM 


■u 

•rl 
60 

OJ 

S 


QJ 
rH 

•s 


0) 
•H 

•H 


f 

O 

-a 

3 
OJ 

to 


o  c 

CN  CM 


+ 


N 


1 

+ 

1 

•H 

•H 

•H 

CM 

X 

X 

•U 

U 

3 

X) 

T3 

c 

O 

ctj 

H 

o 

a- 

•H 

i 

1       -H 

o 

O     CI- 

•o 

TS    -H 

3 

3    w 

OJ 

QJ    i-H 

CO 

CO      3 

CU 

CM    £ 

-15- 


3.3  Normalization  Methods 

Consider  a  function  of  two  variables  which  is  to  be  subjected  to  a 

series  of  transformations  under  which  the  function  value  is  to  be  invariant. 

The  first  variable  will  be  transformed  to  a  specified  value  while  the  second 

is  altered  in  such  a  way  that  its  value  is  forced  to  approach  the  desired 

function  value. 

For  example,  consider  the  function  y  +  log  x.   Let  x  =  x,  y  =  y,  and 

o       o 

define  the  sequence  (x.,v.)  by  x    =  a  .x . ,  Y ..-,   =  Y.   -   log  a  ,  where  the 

a.  are  positive  constants. 
1 

Then  we  have 

y  +  log  x  =  y  +  log  x  =  y  +  log  x  =  . . .  =  y  +  log  x  . 
o       o    1       ±         n       n 

If  the  a.  are  chosen  in  such  a  way  that 

x 

lira   x  =  1,  we  see  that  lim  y  =  y  +  log  x. 
n-*»  n  n-*>°  n 

Thus,  as  x  is  "normalized"  to  one,  y  is  forced  to  the  desired  function 
n  n 

value. 

The  above  is  but  one  example  of  a  function  evaluated  by  a  normalization 
method.   Others  amenable  to  the  same  approach  are  given  in  Table  5,  along 
with  appropriate  transformations  for  the  variables. 


Function        (x  ,y  ) 

o  o 


(Xi+l'yi+l> 


lim(x  ,y  ) 
n"*00  n'  n 


y  +  log  x  (x,y) 

y/x  (x,y) 

w/x^  (x,y) 

yeX  (x,y) 


0*iXi*yi~  log  a±) 
(a.x^a.y.) 

(ai2xi»aiyi) 

(x±  -  log  a±»aiY±) 


(l,y  +  log  x) 
(l,Y/x) 

(i,y/x  h) 
(o,yex) 


Table  5:  Normalization  Methods 
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A  number  of  authors  have  addressed  themselves  to  the  implementation 
of  the  normalization  methods,  including  (in  historical  order),  SpecHer  [61], 
Perle  [46],  DeLugish  [16],  and  Chen  [5].   The  principal  matter  to  be  decided 

— n 

is  how  the  a.  should  be  chosen.   Clearly  a.  is  to  be  of  the  form  1  +  s.2   i, 
l  l  l 

where  p.  is  an  integer,  and  s.  =  ±  1  or  0.   Ideally,  the  a.  should  be  chosen 
l  l  l 

so  that  x   is  forced  to  within  a  specified  tolerance  of  its  limit  value  for 
n 

as  small  an  n  as  possible.   We  shall  discuss  the  various  strategies  for 
choosing  the  a.  as  we  consider  the  various  functions. 

Chen  suggests  the  use  of  a  termination  algorithm  which  decreases  the 
number  of  iterations  required.   Essentially  the  idea  is  to  use  a  one  term 
Taylor  series  expansion  when  the  most  significant  half  of  the  number  has 
been  computed.   This  is  generally  done  at  the  expense  of  a  half  precision 
multiply  (i.e.,  the  most  significant  half  of  the  multiplier  is  represented 
by  all  zero  bits).   It  does  have  an  additional  advantage  in  that  it  halves 
the  round-off  error  accumulation  since  only  about  one-half  as  many  iterations 
are  required.   We  will  discuss  the  individual  termination  algorithms  in 
the  next  section. 

The  advantage  of  the  normalization  methods  is  their  simplicity  (depending 
on  how  one  decides  on  a.),  and  their  potential  speed.   Of  course  they  are 
not  as  versatile  as  the  CORDIC  algorithm  and  no  normalization  methods  have 
been  devised  for  trigonometric  functions. 

4.0  Algorithms  for  Specific  Functions 

4.1  Quotient 

There  are  several  kinds  of  division  algorithms.   We  have  previously 
mentioned  the  CORDIC,  successive  subtraction,  and  normalization  algorithms. 
Another  normalization  method  using  multiplication  and  based  on  Newton's 
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method  has  been  proposed.   This  procedure  converges  quadratically ,  and 

thus  is  best  suited  when  high  precision  is  required.  Flynn  [20]  discusses 

a  number  of  iterative  techniques  for  division.   Another  method  is  based  on 

a  reciprocal  generator.   Huber  [24]  did  a  thesis  on  binary  division  algorithms 

which  discusses  most  of  the  above  methods. 

Suppose  that  we  desire  the  quotient  X^Y      ,  where  Y  =  2  «y ,  X  ■  2  *x, 
with  a,B  integers  and  h  <   x,y  <  1,  that  is  X  and  Y  are  expressed  in 
normalized  form.   In  some  algorithms,  we  will  assume  y  <  x,  relaxing  the 
requirement  that  both  x  and  y  be  in  [^,1).   This  leads  to  the  quotient  being 
in  [^,1),  hence  no  normalizing  shift  would  be  required  for  the  quotient. 
In  any  case,  we  concern  ourselves  with  the  generation  of  the  quotient  y/x. 

4.1.1   CORDIC  Algorithm 

Inspection  of  Tables  1  and  2  reveal  that  N  +  1  iterations  are  required. 

Because  of  the  sequence  of  a's  for  this  case  we  must  have  y  <  x;  this  can 

be  accomplished  by  right  shifting  y  one  place,  if  necessary,  or  always.   To 

avoid  a  possible  left  shift  to  normalize  the  quotient,  the  right  shift  of 

y  should  be  done  only  if  necessary. 

After  N  =  N  +  1  iterations  we  obtain 
o 

N 

yN  +1  =  Yo  "  Xo   £   SiV 
o  i=0 

N 
o 

then      I  yXT    ,  n  I    =    |  y     "  x        I     s .  a .  |      <     x  a     =  x  «         . 
'N+l1         '    o  o.^ii1  N  N  oN 

o  i=0  o 

N 
o 

Since   z„  , ,  =  -   7  s.a.,  the  above  yields,  with  z  =  0  , 
N+l     .Ln     l  l  o 

o       i=0 

\J    ,x         i         «-N-l    .  .  ,    ,  ,  ,   „-N-l 

o/o-z   ,<a+  =2    .   Thus  the  truncation  error  is  bounded  by  2 
1  N  +1 '     N 

o        o 
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If  a  left  shift  were  necessary  to  normalize  the  quotient,  the  error  would 

-N+l 
become  as  large  as  2    ,  thus  we  see  why  it  is  important  to  right  shift 

Y  only  if  necessary. 

4.1.2  Restoring  and  Non-restoring  Division 

It  would  again  be  advantageous  to  have  y  <  x  and  y/x  in  [^,1)  to  avoid 
a  normalization  shift  at  the  end  of  the  operation.   Also,  one  would  then 
avoid  generating  N+l  quotient  bits  when  only  N  are  necessary,  since  the 
first  bit  would  always  be  1. 

If  N  quotient  bits  are  generated,  there  will  be  as  many  as  N 
subtractions  of  the  divisor  from  the  dividend.   Assuming  left  shifts  of  the 
dividend  rather  than  right  shifts  of  the  divisor,  no  roundoff  error  will 
occur  during  these  operations.   The  truncation  error  will  be  determined  by 
the  size  of  the  remainder  divided  by  the  divisor,  and  will  be  bounded  by 
K*  2     ,  where 

II   if  a  final  round  is  performed 
2   if  chopping  is  used 

Thus  the  error  is  at  most  one  bit. 

If  N  +  1  quotient  bits  are  generated  with  a  possible  right  shift 
required  to  normalize  the  quotient,  the  error  bound  is  the  same. 

Nandi  and  Krishnamurthy  [44]  proposed  a  procedure  for  a  non-restoring 
type  of  division  for  radix  $.   It  obtained  the  quotient  in  redundant  form, 
and  was  especially  designed  for  divisors  with  leading  coefficient  1  or  3  -  1. 
While  the  procedure  could  be  adapted  for  radix  2,  there  is  a  conflict  in 
the  decision  process,  which  differs  depending  on  whether  the  leading  coefficient 
of  the  divisor  is  1  or  $  -  1.   These  digits  coincide  in  radix  2,  of  course. 
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The  decision  process  for  the  quotient  bits  is  somewhat  complicated,  as 
well. 

The  advantage  of  redundant  representation  (using  ±  1  and  0  as  coefficients, 
instead  of  just  1  and  0)  of  function  values  is  that  more  zero  bits  can  be 
"built  in".   Every  number  has  a  minimal  representation,  i.e.,  a  representation 
with  a  minimal  number  of  non-zero  bits.   Metze  has  investigated  this  idea 
for  several  functions,  one  of  which  is  division  [41] •   Basically  the  idea 
is  an  extension  of  non-restoring  division,  and  contains  S-R-T  division  [50] 
as  a  special  case.   The  decision  as  to  the  value  of  the  quotient  bit  is 
based  on  the  value  of  the  previous  remainder.   The  decision  is  somewhat 
complicated,  with  the  comparison  constant  a  function  of  the  divisor. 

The  following  procedure  is  that  given  by  Metze  for  minimally  represented 

quotients.   The  comparison  constant,  K,  is  obtained  from  Table  6.   We  first 

assume  that  h  ^   x  <  1  and  0  <  y  <  x.   We  just  as  well  assume  that  H  ^   y/x  <  1. 

n    _. 
Let  Q  -  y/x  =  £  q.2   ,  Q   =  0,  2R_  =  x.   For  i  =  0,1,...,  N   , 

i=0  1 
determine  q.  by 


qi  = 


Then  2R.,  =  2(2R.  ,  -  q.  x) 
l       l-l    l 

Q.  =  Q.  ,  +  q.2"1 
i   i-i   i 

The  comparison  constants  given  in  Table  6  are  not  particularly  con- 
venient, but  it  is  the  smallest  number  of  different  ones  possible.   Another, 
more  convenient  set  is  given  in  Table  7. 

For  S-R-T  division  the  comparison  constant  is  K  =  h   for  all  divisors, 
and  a  minimally  represented  quotient  is  obtained  for  divisors  in  the  range 
[3/5,  3/4]. 
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1 

if   2R.  .  >  K 
i-i 

-1 

if    2^.1  <   K 

0 

otherwise 

Range  of  divisor 

K 

[1/2,  39/64) 

13/32 

[39/64,  3/4) 

1/2 

[3/4,  15/16) 

5/8 

[15/16,  1) 

3/4 

Table  6 

Range  of  divisor 

K 

[1/2,  9/16) 

3/8 

[9/16,  5/8) 

7/16 

[5/8,  3/4) 

1/2 

[3/4,  15/16) 

5/8 

[15/16,  1) 

3/4 

Table  7 
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A  minimal  recoding  can  be  obtained  by  requiring  that  non-zero  bits  be 

N+l 
separated  by   at  least  one  zero,  thus  at  most  [  o  ]  bits  can  be  non-zero 

(see  [41],  first  paragraph).   Because  of  the  left  shift  of  the  remainder, 

R. ,  no  roundoff  error  is  introduced.   It  is  easy  to  prove  that 

y  =  x-Q,  +2  R.  thus  for  k  =  N,  we  have  truncation  error  y/x  -  Q  =  

k       k.  N     x 

We  show  by  induction  that  |R  I  <  x.   R   =  x/2,  and  assume  that  I R,   I  <  x. 


Then 


if  qk  =  0,  Rk  =  2Rk_1,  and  |R^|   =  | 2R   |  <  K,  by  the  selection  rule 


for  q, ,  and  the  recursion  formula.   Tables  6  and  7  both  show  that  K  <  x 
k 

thus  IR^I  <  x.   If  q,    =  ±  1,  we  have  R.  =  2R^_1  -  sign  (R^   )x,  or 
IR,  I  =  1 2 1  R,_-L  I  ~  xl  <  x,  since  by  induction  IR,  J  <  x.   Thus  |R-|  <  x 
for  all  k,  and  in  particular,  |R.^|  <x,  thus  the  truncation  error 


2'\ 


-N 
<  2   ,  or  less  than  one  in  the  least  significant  bit. 


x 

Note  that  no  shift  for  normalization  is  required.   In  order  to  be 

assured  that  y  <  x,  however,  it  may  be  necessary  to  right  shift  y   initially. 

-N-l 
If  no  guard  bit  is  provided,  this  could  cause  an  error  of  2     in  y ,  and 

since  2    /x  <  2   ,  we  see  that  the  error  could  be  as  large  as  2    »  or 

the  last  two  bits  could  be  in  error.   The  error  in  y  can  be  eliminated  with 

one  gaurd  bit. 

Metze  makes  no  mention  of  the  average  number  of  non-zero  quotient  bits. 

The  probability  of  a  zero  occur ing  in  S-R-T  division  is  about  .63,  and  Metze 's 

algorithm  must  do  better  than  that.   We  noted  previously  we  must  have  about 

one-half  of  the  digits  equal  to  zero,  although  it  is  unlikely  that  in  the 

average  case,  half  of  the  remaining  digits  will  also  be  zero.   The  probable 

number  of  non-zero  bits  is  likely  between  N/4  and  N/3. 
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4.1.3  Normalization  Methods 

As  we  noted  earlier,  the  principal  matter  to  be  decided  here  is  a 

strategy  for  the  choice  of  a. (see  Table  5).   We  assume  that  a,  =  1  +  s.2   i  , 
°  l  i        i 

where  s  =  ±  1  or  0,  and  p.  is  an  integer.   The  iteration  procedure  is 

x  =  x,  y  =  y,  and  for  i  =  0,1,2,...,  n-1 
o       o 

Xi+1  =  Xiai 

y±+i  =  yiai  • 

Further  we  assume  that  %  <   x  <  1.   We  discuss  several  strategies  for 

choice  of  a.. 

l 

The  first  is  very  simple,  and  reminiscent  of  the  CORDIC  algorithm. 

1  +  2"1   if   x.  <  1 

l 

a.  = 
1  -i 

1-2     if   x.  >  1 
i 

It  is  easily  seen  that  1-2   <  x. ,,  <  1  +  2   ,  thus  after  n  iterations 


ly  +1  -  y  /x  I   =   ly  +1  -  y  .-«  +1 1  =   1%^  •  y  ..  \*  2~n  ^±1  =  2"ny/x    . 

yn+l    o  o      n+1    n+1/  n+1      x  §1     n+1       x  in 


K  +1"1 
h+1 


>-n 


Thus,  the  relative  error  is  2   ,  and  if  x  and  y  are  normalized  as  usual, 

st 
h  <   y/x  <  2,  hence  the  truncation  error  is  less  than  one  in  the  (n-1) 

significant  bit.   Roundoff  error  would  accumulate  up  to  log  n  +  1  bits, 

thus  if  n  =  N  +  1  iterations  are  used,  log-(N  +  1)  +  1  guard  bits  would 

give  an  error  no  greater  than  one  in  the  last  bit. 

Another  similar  strategy  is 

1  +  2_1   if   x.(l  +  2_i)  <  1 


a.  = 

i 


V 


otherwise 


-23- 


This  division  procedure  is  the  analogue  of  the  Specker  [61]  and  Perle  [46] 
algorithms.   It  has  the  seeming  advantage  of  skipping  some  iterations 
(if  a.  =1),  and  the  real  advantage  of  keeping  x.  <  1.   However,  note  that 
the  decision  requires  calculation  of  a  tentative  value  of  x  - ,  hence  no 
saving  is  obtained  there.   The  roundoff  error  would  be  decreased  in  the 
average  case,  but  not  the  bound.   Truncation  error  is  the  same,  although 
one  knows  its  sign  and  could  compensate  after  the  last  iteration. 

DeLugish  [16]  gives  a  more  sophisticated  strategy  for  choice  of  a.. 
DeLugish  normally  requires  an  initialization  procedure.   In  this  case  it  is 


< 


if   1/2  <  x  <  3/4 
o 


1   if   3/4  <  x  <  1 
o 


and  x.  =  a  x  ,  y.  =  a  y 
1    o  o  Jl         o"o 

Now,  as  one  would  likely  do  in  practice  in  the  previous  examples, 

DeLugish  does  not  actually  compute  the  sequence  of  x.'s.   Rather,  for  ease 

in  testing,  we  compute  the  sequence  u.  =  2  (x.  -  1),  and  then  u  -  =  2u.  + 

s.  +  s.u.2    .   Here  p.  =  i+1,  and  s.  is  determined  by 
ill  l  l 


s.  = 

i 


if   u.  <  -  3/8 

i 


<  -1   if   u.   >   3/8 

i 


0        otherwise 
Delugish  shows  that  Ix.  -  ll  <  3/8«2~1+1  for  i  >  2,  thus  lx^   -  ll  <  3/8 -2~N, 

so  truncation  error  is  no  more  than  3/4  that  derived  for  the  earlier  examples. 

No  study  was  made  of  roundoff  error,  but  the  bound  depends  on  the  number  of 

iterations  with  s.  4   0.   DeLugish  shows  the  probable  number  of  non-zero  s.'s 

is  about  N/3.   The  maximum  number  is  not  discussed.   However,  it  is  easy  to 
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show  that  not  more  than  two  successive  s.  can  be  non-zero.   For  i  >  2  , 

1  * 

\x     -  1|  £  3/8  •  2~i+1,  thus  lu  I  <  2i.3/8'2"i+1  =  3/4.   Now,  suppose 

u.  >  3/8  and  thus  s.  =  -  1.   Then  u...  =  2u.  +  s.  +  s.u.2_1  <  2-3/4  -  1  =  1/2, 
l  l  l+l     i    l    l  i 

Also  we  have  u  -  >  (2  -  1/4)- 3/8  -  1  =  -  11/32  >  -  3/8.   Then,  suppose 

3/8  <  u#11  <  1/2,  hence  s#11  =  -  1,  since  otherwise  s.,,  =  0.   Then 
l+l  l+l  l+l 

tt.j-o  =  2u-^    +  ^..^...2  ^  (2  "  l/8>l/2  -  1  =  -  1/16,  and  as  before 

1+2     l+l    l+l  l+l 

u.,»  >  -  3/8.   Thus  s.,„  =  0,  as  was  to  be  proved.   If  u.  is  negative,  the 
i+2  i+2  l 

2N 
argument  is  symmetric  and  the  result  follows.   Thus  we  have  that  log-—  +  1 

guard  bits  are  sufficient,  with  chopping. 

The  method  suggested  by  Chen  [5]  used  a  different  approach.   Here  we 

must  use  some  sort  of  procedure  for  counting  left  zeros  in  1  -  x..   We  write 

1  -  x.  =  2  pi  +  v.,  where  0  <  v.  <  2  "1.   Then  p.  is  one  plus  the  number  of 
11  l  l 

left  zeros  in  1  -  x..   With  a.  =  1  +  2  i  we  have 

i        i 

x.Ll  =  (1  -  2~pi  -  v.)(l  +  2_pi)  =  1  -  v.(l  +  2~Pi)  -  2~2pi  . 
l+l  l  i 

Thus  we  see  that  the  transformation  has  eliminated  the  leading  non-zero  bit 

of  1  -  x.,  and  leaves  the  succeeding  bits  largely  unchanged.   Thus,  in  no 

-N-l 
more  than  N  +  1  iterations,  1  -  x.  <  2    .   On  the  average,  one  expects 

only  about  N/2  iterations  would  be  necessary.   The  error  bound  here  is  the 

same  as  for  the  first  two  ideas  presented. 

However,  Chen  suggests  a  termination  algorithm,  to  be  applied  when 

p.  >  N/2.   It  is  simply  a  Taylor  series  expansion  about  the  then  current 

point.   In  this  case,  if  we  have  computed  (x  ,y  ),  (x..  ,y.  ),...,  (x  ,y  ),  with 

p  >  N/2,  then  y/x  ~  y  +y(l-x).   This  requires  a  multiply,  however  note 
n  nn      n 

that  since  p   >  N/2,  1  -  x  is  a  half  precision  number  (i.e.,  the  most 
n  n 

significant  N/2  bits  are  zero),  thus  costing  half  a  usual  multiply  time. 
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-N 
The  truncation  error  bound  is  2   .   Chen  notes  that  since  the  error  is  of 

a  definite  sign,  we  could  halve  the  truncation  error  by  computing 

y/x  =  y  +  y   (1  -  x  +  2~N_1)  . 
n    n       n 

The  roundoff  error  is  halved,  of  course,  since  at  most  N/2  iterations 

need  to  be  performed.   Hence  log  N  guard  bits  are  sufficient  with  chopping. 

The  expected  number  of  iterations  is  N/4. 

4.1.4  Quardratically  Converging  Normalization  Methods 

The  methods  of  Section  2.1.3  converge  linearly,  that  is,  the  number 

of  iterations  required  depends  linearly  on  the  number  of  bits  of  accuracy 

required.   Quadratically  convergent  normalization  methods  are  based  on  the 

Newton  method  for  solving  1/z  -1=0  with  initial  guess  z  =  x  =  x, 

o    o 

where  x  is  the  divisor.   The  calculations  can  be  arranged  as  y  =  y, 

x  =  x,  then  for  i  =  1,2,...,  n, 
o 

Xi  =  Xi-1(2  "  Xi-1} 


y±  -  ylTTl(2  -  Vi}    • 


21 
The  error  1-x.  =E.  =(l-x)   .   Thus  convergence  is  critically 

tied  to  the  accuracy  of  the  "initial  guess".   We  must  have  |l  -  x  |  <  1  for 

convergence. 

We  see  that  if  0  <  x  <  2,  convergence  is  assured,  hence  x  =  x  is 

o  -  o 

adequate  for  convergence,  with  x  in  [1/2,1). 

We  pause  to  consider  the  number  of  iterations  required.   If  we  want 

E  <  2   ,  then  n  must  be  taken  so  that  (1  -  x  )    <  2   ,  or 
n  o 

2n>    N 


-log2|l-xo 
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For  x  in  [1/2,1),  (1  -  x  )  <  1/2,  thus  no  more  than  log^N  iterations  would 
o  o  2 

be  required. 

We  could  improve  this  a  little  by  some  initial  adjustment  of  x  and  y. 
Thus  we  might  take  x  =  ax,  y  =  ay,  where  a  may  or  may  not  be  a  function 
of  x.   To  avoid  an  actual  multiplication  here,  we  could  take 


/ 


a  = 


2   if   1/2  <  x  <  2/3 


1   if   2/3  <  x  <  1  . 

Then  1 1  —  x  I  ^  1/3,  and  log.  - =■  ~  log.  ■= — ^r—  iterations  would  then 

o  °2  log  3      °2   1.58 

be  required.   For  N  =  20,  five  iterations  would  be  required  by  the  previous 
procedure,  and  four  iterations  plus  the  initialization  for  the  latter  pro- 
cedure.  This  example  is  biased  somewhat,  however.   Note  that  if  N  is  a 
power  of  two,  no  advantage  is  had  by  first  initializing  as  above.   For  N 
a  power  of  2,  one  would  need  to  initialize  so  that  |1  -  ax|  <  z     to 
save  k  iterations. 

In  general  the  roundoff  error  doubles  at  each  iteration.   Thus  for 

-N-l 
roundoff  error  to  be  less  than  2     requires  that  J  guard  bits  be  used, 

where  2  (2    )<2     ,orJ=n+l  with  chopping. 

Krishnamurthy  [29]  has  considered  solving  1/z  -  1/K  -  0  instead  of 

1/z  -1=0.   For  a  simple  recursion  for  the  x.'s,  this  necessitates  K  =  2 

1 

for  some  integer  r,  although  K  =  2/3  is  possible.   He  also  considers  using 
lower  precision  multiplies  for  early  iterations.   A  table  of  initial  trans- 
formations is  given  for  computing  a  48  bit  quotient  in  4  iterations  (plus 
the  initial  transformation) . 

The  idea  of  a  table  for  the  initial  transformation  is  used  in  the  IBM 
System  360  Model  91  [1].   With  that  machine  56  bit  accuracy  is  desired,  and 
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-r 


a  table  of  2  =  128  entries  is  used  and  gives  (1  -  x  )  <  2   .   Thus,  after 
the  initial  transformation,  only  three  iterations  would  be  required.   However, 
in  order  to  speed  up  the  early  iterates,  low  precision  multiplications  are 
used  with  truncated  intermediate  results,  and  four  iterations  lead  to  64  bit 
accuracy. 

Ferrari  [19]  proposed  a  method  he  calls  the  optimized  geometric  series 
(OGS)  method.   If  the  two  multiplications  are  to  be  done  serially,  his  method 
is  faster,  since  convergence  is  superquadratic.   If  the  multiplications  can 
be  done  simultaneously,  his  method  has  no  advantage. 

Shaham  and  Riesel  [58]  suggest  a  modification  of  the  Newton  iteration. 

-k  -k 

Suppose  that  1  -  x.  =  2   +  v.,  with  0  <  v.  <  2   .   Then  normally  one  would 

have  1  -  x.+1  =  2  '      +   v.,-,  with  0  ^  v-+i  <  2  '  .   They  suggest  using  as  a 

-2k 
multiplier,  2  -  x.  +  p2    ,  instead  of  2  -  x.,  where  p  is  a  table-look-up 

value  depending  on  the  first  few  bits  of  v..   They  show,  for  example,  that 

a  56  bit  result  can  be  obtained  in  one  less  iteration  than  with  the  classical 

method,  starting  with  an  initial  result  accurate  to  six  bits. 

Ling  [33]  has  given  a  version  of  this  algorithm  which  was  purportedly 

designed  especially  for  32  bit  accuracy,  and  yields  the  quotient  in  3 

multiplication  times.   Ling's  method  is  to  first  transform  the  divisor, 

—8  —8 

x  =  .15-6... ,6_  +  2  (.6a...&   )  into  a  form  x  =  ,d....d_  ±  2   (.dQ..d  ),  i.e., 
2.   5  o         y    n  1/         yn 

with  the  eighth  bit   equal   to   zero.      We  see   that   if  6 _  =   0,    the  plus   sign  is 

taken,    and  d.    =6..      If  5n  =   1,    the  minus   sign   is   taken,    and    ,d...d_  = 
l  l  8  17 

—  8 
•1tS2...6g  +  2      ,    .dg...d     =    (1  -    .5g...6    ).      If  62  =  53  =    ""    =  58  =   ** 

there  is  an  apparent  difficulty  here,  which  washes  out  later  on.   Ling 
implements  the  above  via  logical  circuits  rather  than  by  actual  addition 

y(2k) 

and  complementation.   The  initial  transformation  is  then  y/x  ■  l  prrt  ,  where 
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k  =  l/2-(.d  ..d  )  "  is  obtained  by  table-look-up.   We  then  have  x(2k)  = 

1  ±  2  k(.dg...d  ),  the  ±  having  been  determined  previously.   Ling  indicates 

—  ft  —  T? 

that  the  denominator  is  within  2   of  one,  thus  within  2  '  '  of  one  after 

two  Newton  iterations.   This  is  not  the  case  however,  as  can  be  seen  by 

considering  x  =  . 1000000011. . .1,  whence  k  =  1,  and  2xk  =  1  +  2~7  (. 111. . . 1) . 

Thus  Ling's  algorithm  is  good  for  at  least  28  bits,  but  one  cannot  be  sure 

of  more  than  28  bits. 

4.1.5  Other  Methods 

Dean  [10]  proposes  a  reciprocal  generator  which  forms  the  reciprocal 
of  a  number  in  a  controlled  register  as  the  number  is  entered  serially  into 
a  shift  register.   Formation  of  the  quotient  bits  is  done  by  means  of  logical 
circuits.   He  gives  complete  information  for  four,  five,  and  six  bit  numbers. 
The  logic  would  seem  to  become  quite  complicated  for  high  precision  numbers. 
However,  if,  as  Dean  says,  the  size  of  connectors  negates  the  possiblity 
of  components  becoming  much  smaller,  but  rather  their  complexity  increasing 
to  take  up  (some  of)  the  available  space,  such  a  device  might  be  feasible 
where  speed  is  the  main  consideration.   However,  I  am  assuming  the  logical 
circuits  could  be  implemented  so  as  to  be  very  fast. 

4.2  Arctangent 

Several  authors  have  directed  their  attention  to  calculation  of  the 
arctangent.   While  it  is  not  readily  apparent  when  glancing  through  the 
literature,  it  is  in  fact  true  that  only  one  algorithm  has  been  developed, 
so  far  as  this  author  has  been  able  to  determine.   The  CORDIC  method  was 
the  first  to  appear,  followed  by  Meggitt  [40]  and  Sarkar  and  Krishnamurthy 
[55],  Specker  [61],  and  DeLugish  [16].   The  algorithms  differ  mainly  from 
the  CORDIC  algorithm  in  that  not  all  of  the  rotations  are  performed. 
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Meggitt  keeps  z.  >  0,  Sarkar  and  Krishnamurthy  minimize  |z.|,  Specker's 
method  coincides  with  the  CORDIC  algorithm,  and  DeLugish  uses  a  decision 
process  similar  to  that  for  division. 

We  assume  that  tan  y/x  is  to  be  computed,  where  x  4=  0.   We  will  generally 
assume  that  a  first  quadrant  angle  is  to  be  computed,  hence  an  initial  trans- 
formation using  the  identity  tan   (-y/x)  =  -  tan  y/x  may  be  necessary.   For 
the  CORDIC  algorithm,  we  must  have  x  >  0,  so  the  sign  is  attached  to  whichever 
of  x  or  y  is  appropriate.   If  the  result  of  the  calculation  must  be  in  the  correct 
quadrant,  additional  care  must  be  taken  with  that  step,  and  furthermore,  an 
additional  rotation  through  it  radians  may  be  necessary.   In  the  CORDIC 
algorithm  this  can  be  accomplished  with  no  post-manipulation  of  the  data. 

4.2.1  CORDIC  Algorithm 

As  Table  1  indicates,  tan  y/x  is  obtained  by  the  algorithm  with  m  =  1 
in  the  vectoring  mode.   Since  we  have  discussed  the  general  procedure  in 
some  detail  in  Section  3.1,  we  need  only  to  discuss  the  truncation  error 

bound  for  this  particular  calculation.   As  we  indicated,  the  magnitude  of 

2     2  1/2 
y„  ...  can  be  as  large  as  x„  a   in  vectoring  mode.   Now  x_.  ~  K.  (x   +  y  ) 
•'N-.+l  Nl  N1  N,    1  o    yo 

r\  O  1  /  O    Ml 

so  IYv,  ^i  I  -  Ki  (x   +  y  )   2    .   Consideration  of  equation  (6)  and  some 
N1+l     1  o     o 

yN  +1 

algebraic  manipulation  yields   tan  a  =  y  /x - .   Thus  we  have 

°  r  ooxK, cosa 

o    1 

-1  ~1  Y 

a  =   tan      [y/x     -  y]   =   tan     y/x ' — -       , 

;o  o  o  o   1    2 

1  +  r 

where  r  is  between  y  /x  and  y/x  -  y  •   Thus  the  truncation  error  is 

o  o      o  o 


2  2,2 

1  +  r       1  +  y   /x 
J  o        o 
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2  __  .,   2,   2,1/2  -N-l  -N-l 

x   .  K.  (x  +y   )    2  x  2 

Then  I 1 _|  <  _° 1  o  yo s m   __o 


2/   2       v       /   2  j.  2,       ,2.   2,1/2 
1+y   /x       x  K.  cos  a(x  +y   )       (x  +y   )    cos  a 
;o   o       o  1        o  •'o  o  Jo 

X 

But  — 5 ?  i  /?  ~    cos  a>  hence  the  truncation  error  is  no  more  than 

(xo  ^o  } 

-N-l 
(approximately)  2    .   Roundoff  error  is  likewise  bounded,  hence  total 

error  is  no  more  than  2 

We  will  consider  the  problem  of  initializing  for  both  principal  values 
of  the  arctangent  and  for  values  in  the  correct  quadrant,  the  quadrant 
determined  by  the  signs  of  y  and  x. 

For  principal  value  we  let 

x  =  x  sign  x 

y  =  y  sign  x 
o 

z  =  0   . 
o 


For  the  correct  quadrant  to  be  obtained,  we  take 


x  =  x  sign  x 
o 


yQ  =  y  sign  x 


z   = 
o 


7T   if   sign  x  <  0 


0   if   sign  x  >  0  . 


One  can  also  accomplish  the  initialization  by  always  doing  an  initial 

rotation  of  tt/2  radians,  clockwise  if  y  >  0,  and  counterclockwise  if  y  <  0. 

This  will  also  force  x  >  0.   Then 

o 

x  =  y  sign  y 
o 

y  =  -  x  sign  y 

z  =  tt/2  sign  y 
o 
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4.2.2  Modified  CORDIC  Algorithm. 

We  will  discuss  the  various  modifications  of  the  CORDIC  algorithm  in 
a  unified  manner.   Basically,  all  of  the  modifications  can  be  put  in  the 
same  form  as  equations  (3)  -  (5),  except  that  the  decision  for  s  is  different, 
This  will  change  the  radial  distortion  constant,  but  it  does  not  enter  sig- 
nificantly in  this  case,  anyway. 
Meggitt  takes 


s .  = 

l 


sign  y.    if 


y.  ^  54x- 

i '      i  i 


otherwise 


Sarkar  and  Krishnamurthy  take 

sign  y.    if    y.  -  sign  y.*6.x.   <  ly. 

1  '  1  1X1        i 


s .  = 

i 


otherwise 


-1 


DeLugish  considers  tan  x,  0  ^  x  <  1,  and  takes 


+1   if   y.  <  -  3/8  2 
l 


-l 


s.  = 

i 


-1   if   y  >  3/8  2 


otherwise. 


-i 


with  the  initialization  procedure 


x_x  =  2p(x  -  1)  ,  y_1  =  2p(x  +  1)  , 


with  3  chosen  so  that  x   is  in  [-1,0). 
Then  y.  is  in  [1,2).   We  set  z  =  tt/4, 


x  =  x  .  t  ,  y  =y.t   ,   where 
o    -1  o   o    -1  o 
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3/4    if    0  <  x  <  1/4 


t  =     (5/8    if    1/4  <  x  <  1/2 
o 

k    1/2    if    1/2  <  x  <  1 

Note  that  the  sign  of  the  s.  is  different  because  x.  <  0  for  his  initialization. 

The  error  bounds  for  all  of  the  above  are  the  same  as  for  the  CORDIC 
algorithm,  and  the  roundoff  error  will  probably  be  smaller  in  practice  because 
the  average  number  rotations  is  less  than  N+2.   For  the  DeLugish  method  the 
average  number  of  rotations  (number  of  non  zero  s.)  is  about  N/3. 

4.3  Cosine/Sine 

As  with  the  arctangent  algorithms,  all  of  the  algorithms  for  cosine/sine 
reduce  to  the  CORDIC  algorithm,  or  a  variation  of  it.   We  consider  that  the 
angle,  9,  could  be  any  angle.   This  is  then  reduced  to  the  range  [0,2ir). 
Let  the  reduced  angle  be  denoted  by  Z.   Then  we  can  find  z  in  [0,tt/2)  so 
that  Z  =  tt/2  •  Q  +  z,  where  Q  =  0,1,2,  or  3.   We  compute  sin  z  and  cos  z, 

and  then 

( 

sin  z   if   Q  =  0 

cos  z  if  Q  =  1 
-sin  z  if  Q  =  2 
-cos  z   if   Q  =  3 


sin  Z  =  sin  9  =  \ 


and 


cos  Z  =  cos 


cos 

z 

if 

Q  = 

0 

-sin 

z 

if 

Q  - 

l 

-cos 

z 

if 

Q  = 

2 

sin 

z 

if 

Q  - 

3 
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The  proper  values  can  be  obtained  either  by  pre  -  or  post  -  manipulation 
of  the  data. 

4.3.1  CORDIC  Algorithm 

By  Table  1  we  see  that  if  one  enters  the  algorithm  with  m  =  1,  x  =  K    , 
y  =  0,  and  z  in  the  rotation  mode,  one  obtains  cos  z  and  sin  z  for  |z|  <  it/2. 
From  equation  (6) 

x^  ,,  =  K-  (K    cos  a)  =  cos  a 

y„  ,,  =  K  (-K   sin  a)  =  -  sin  a  . 

Now  z  +  a  =   z„  , ,   <  a„  ,  thus 
'      '    '  N-  +1 '     N1 


x^  ,  1  =  cos  (z  -  y)  =  cos  z  +  sin  z*  •  y. 


1 


y  +1  =  sin  (z  -  y)  =  sin  z  -  cos  z  •  y* 

where  z*  and  z  are  each  within  lyl  of  z,  and  IyI  -  a«  =  tan  2     ~   2 

N-. 

Then,  since  sine  and  cosine  are  bounded  by  one,  the  truncation  error  is 

-N-l  -N-l 

bounded  by  2    .If  roundoff  error  is  also  bounded  by  2    ,  total  error 

-N 
is  bounded  by  2 

4.3.2  DeLugish  Modification 

Because  the  distortion  constant  K  depends  on  the  rotations  performed,  e„ 

1  Rx 

the  usual  DeLugish  [16]  modification  will  not  work.   Recall  that  the  distortion 

in  the  step 

x.M  =  x.  +  s.6  .y . 
l+l    l    ill 

J  i+l    i    i  i  i 

2  1/2 
is  (1  +  6 .  )    .   A  correction  to  remove  the  distortion  would  require 


-34- 


2  1/2 
division  of  each  variable  by  (1  +  6   )    .   This  is  not  feasible,  of  course. 

0    1/0  0  A 

But  (1  -  Si  )     =  1  -  1/2  5±  +  3/8  6±  ...,  and  if  6   is  small  enough  for 

-N-l  2-1/2  2 

the  third  term  to  be  less  than  2     ,  we  have  (1  +  6 .  )  '    =  1  -  1/2  6.  , 

rounded  to  N  digit  accuracy,  and  such  a  correction  could  be  made  with  a 

2  -1/2 
shift  and  subtract,  since  6 .  is  a  power  of  two.   Eventually,  (1  +  6   )     =  1, 

rounded  to  N  bit  accuracy,  and  then  no  correction  is  required.   This  idea 

is  used  by  DeLugish  to  introduce  redundancy  into  the  cosine/sine  calculation. 

We  consider  0  ^  z  <  tt/2  .   The  initialization  is 

z     if    z  <  tt/4 


then 


z  = 
o 


tt/2  -  z   if 


>  tt/4 


V 


(xl»yl>  = 


and    z  =  z  -  tt/4, 
1    o 


(1/K*  ,  (1/K*)  tan  tt/8)  if  z  <  tt/4 


((1/K*)tan  tt/8,  1/K*)   if  z  >  tt/4 


where 


* 

K  =  (1 


/$nl/2  N        -j+1.1/2 
+  tan  tt/8)    II   (1  +  2    ) 

j=0 


Here  N  =  [—7 — ]  +  1,  the  number  of  iterations  to  be  performed  before  shift 
and  subtract  corrections  can  be  made  for  the  radial  distortion.   The  iteration 
equations  are 


=  (x.  -  s.y.2  X  1)T, 


Li+1 


1  1 


y-j-n  =  <y-  +  s-  x.2"1"1)!. 
1+1     1    11      1 


_   -l0-i-l 

z,,,  =  z .  -  s . tan   2 
l+l    1    1 


where  s .  is  determined  by 
i 
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for  i  =  1,2,. . ,  N   , 


s.  = 


for  i  =  N  +  1,. ..  ,  N 
and 


*'-  i 


i=    < 


■1    if    z.  <  0 

i 


1    if    z  >  0 


■1    if    z.  <  -  3/8  2 


-i 


1    if    z±   >  3/8  2 


0    otherwise 


-i 


1    for    i  =  1,2,. ..,N 


2  -(2i+3}  *        ** 

T.  1  -  s.   2  U1  J;   for  i  =  N  +1,...,  N 


** 


for   i  =  N  +1,...,  N  . 


**    N-3  2  -1/2 

Here  N   =  Hr-]  +  1,  the  point  at  which  (1  +  6 .  )     =  1,  to  N  bit  accuracy. 

The  above  was  as  given  by  DeLugish.   DeLugish,  however,  did  not  consider 

the  effects  of  roundoff  error,  and  his  examples  were  done  with  N  =  40,  and 

at  least  13  guard  bits.   For  purposes  of  roundoff  error  control,  the  values 

of  N  and  N   should  be  determined  on  the  basis  of  N  +  J  bit  accuracy,  rather 

*      **       N+J-6, 
than  N  bit  accuracy.   Thus,  one  should  take  N  and  N   to  be  [ — 7 — ]  +  1 

A      rN+J-3-,      .  ..    - 

and  [ — - — J  +  1,  respectively. 

The  truncation  error  is  similar  to  the  CORDIC  algorithm.   The  roundoff 

N+J 
error  bound  is  complicated  by  the  fact  that  about  —7—  iterations  are  performed 

which  may  require  corrections  T.,  with  T.  +  1.   It  is  conceivable  that  nearly 

all  rotations  (all  s.)  will  be  non-zero,  thus  the  roundoff  error  bound  will 

be  about  25%  larger.   However,  the  avarage  number  of  s.  equal  to  zero  should 

3N 
be  about  — r  ,  so  on  the  average,  roundoff  error  will  be  smaller. 
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4.3.3  An  Alternate  Procedure 

We  have  seen  in  the  previous  section  that  the  radial  distortion 
constait  complicates  the  usage  of  no  rotation  at  any  step.   If  one  were  to 
use  a  scheme  such  as  the  DeLugish  scheme  for  computing  the  tangent  (See 

fan   2 

Section  4.7.2),  the  sine  could  then  be  computed  from  sin  z  =  


2   1/2  * 
(1  +  tan  z)  ' 


This  type  of  algorithm  was  implemented  in  the  Hewlett-Packard  HP-35  calculator 
[6],   In  addition  to  the  fact  that  the  calculations  are  radix  10,  the  goal 
there  was  compactness  of  the  program,  rather  than  speed.   The  above  identity 
is  certainly  not  likely  to  achieve  high  speed. 

4.4  Exponential 

The  calculation  of  the  exponential  function  has  been  treated  by  several 
authors.   Normalization  is  the  principal  technique,  although  the  other  unified 
algorithms  also  can  be  used. 

We  will  generally  allow  the  argument  to  be  any  number,  X.   The  range 
will  be  reduced  by  the  transformation  X  =  Q  log2  +  x,  where  Q  is  an  integer, 
and  0  <  x  <  log  2.   Then  eX  =  eQ  l0g  2  +  X  =  2QeX.   Note  that  1  <  e*  <  2. 

4.4.1  CORDIC  Algorithm 

Table  1  yields  the  information  that  entering  the  algorithm  with  m  =  -  1 
in  the  rotation  mode,  taking  x  =  1/K_1 ,  y  =  0,  and  z  =  argument,  the  result 
of  the  calculation  will  be  x^   ,-  ~   cosh  z,  y      z    sinh  z.   Then  e  = 

cosh  z  +  sinh  z  z   x^  +1  +  YN  +1  • 


We  will  use  log  u  to  denote  log  u  =  In  u,  while  any  other  base  will  be 
specified  by  subscript. 
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The  truncation  errors  are  given  by 


at.   ,,  -  cosh  z  =  cosh  a  -  cosh 


=  cosh  (z  -  z„   . , )  -  cosh  z 
o    N  -+1 

=  -  (sinh  z  )  zN  +1  , 


and  similarly, 


N_..+l  -  sinh  z  =  -  (cosh  z)  z    -  ,  where  z  and  z  are  between 

z  and  z  -  z.T   .,  .   Then,  the  truncation  error  for  e  is  the  sum, 

N_ ..  +1  * 

*       z  ii    -N— 1 

-  z.T   ,.  (sinh  z  +  cosh  z  )  z   -   e  zXT     .   Since   z„   ,,   <  2 
N_,+l  N_i+1  N  -+11 

-N-l    -N 
and  0  ^  z  <  log  2,  the  bound  is  2* 2     =2   .   Roundoff  error  in  each  is 

-N-l  -N 

2     ,  hence  the  total  roundoff  error  could  be  as  large  as  2   ,  giving 

-N+l 
a  total  error  of  no  more  than  2    .   However  recall  that  a  right  shift  is 

-N 
necessary  to  normalize  the  result,  thus  the  error  is  then  bounded  by  2 

4.4.2  Pseudo-Division/Multiplication  Methods 

From  Table  4  we  see  it  is  first  necessary  to  recode  the  exponent  in 
in  the  variable  radix  log  (1+2  )   This  is  accomplished  by  the  modified 
divider  discussed  in  Section  3.2.   Let  us  first  investigate  the  error  in 
the  process  of  recoding  the  exponent.   Without  rounding  error,  and  chopping 

after  N  bits  it  is  easily  seen  that  the  truncation  error  in  the  process  is 

-N     -N 
bounded  by  log  (1+2   )  =  2   .   The  roundoff  error  is  introduced  solely 

through  the  divisors,  and  is  left  shifted  in  the  new  dividend  at  each 

iteration.   If  the  error  in  divisor  d.  is  e.,  we  have  error 

i     i 

2N       2N_1  e        -  2  £ 

^o  o       ^11  q„  N  in  D„,,.   Thus  the  error  accumulation  in 

^N       N+l 

N+l 
remainder  is  the  above,  divided  by  2    .   Assuming  correct  rounding  of  the 
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divisors  the  roundoff  error  accumulation  is  bounded  by 

2"N-1(2N+  ...  +  2)2-N-1  S  =  a"""1. 

-N-l 
Thus  if  no  guard  bits  are  used,  roundoff  error  is  less  than  2    .   Thus 

_N    -N-l 
the  total  error  in  this  procedure  could  be  as  large  as  2   +2       If  we 

p— Y         i  I  — N 

take  y  =  x  =  1  in  the  procedure,  we  obtain  e    ,  where  |y|  <  3/2  •  2 

* 

P      P— Y      P  A 

Thus  the  error  here  is  e  -  e    =  e  y    ,  where  p   is  near  p.   Thus  the 
error  is  bounded  by  e     |y|  <  2  •  3/2  2   =3*2 

Now  in  the  pseudo-multiplication,  roundoff  can  occur  only  in  the 

calculation  of  the  pseudo-multiplicand,  x..   The  error  propogated  to  the 

N-l       N 
pseudo-product  z...,,  then  is  2   q  a     +  2  q.en  +  ...  +  2q„ev,  where  e.  is 
r      r        N+l  ^o  o     1  1  ^N  N*        i 

i   i     _N 
the  error  in  x,- .  e      =  0,  and   e.   <  2   ,  so  the  total  error  is  bounded  by 
1    o  '  1  ' 

N    N-l  -N 

(2  +2    +  ...  +  2)2    <  2.   Because  the  accumulation  is  in  a  double 

length  register  (the  least  significant  part  being  N+l  bits),  we  see  that 

_N-1    _n 
the  actual  error  is  no  more  than  2*2     =  2   . 

-N    -N 
Thus  the  total  error  in  the  procedure  is  bounded  by  3  •  2   +2   = 

-N+2 
2    .   Recall  that  a  right  shift  is  necessary  for  normalization  which 

o-N+1 
reduces  the  error  to  2 

We  note  that  the  above  was  accomplished  without  guard  bits,  and  one 

double  length  register.   The  inclusion  of  one  guard  bit  would  reduce  this 

-N 
error  to  2   .   However  it  would  be  necessary  to  include  an  additional 

quotient  bit  in  the  modified  division  to  recode  the  exponent,  since  this 

was  the  largest  source  of  error. 

4.4.3  Normalization  Methods 

The  basic  algorithm  has  been  described  previously,  the  choice  of  a. 
being  the  issue  to  be  decided.   Specker  [61]  and  later  Perle  [46]  use  the 
criterion 
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1  +  2  x  1    if   x.  -  log(l  +  2  i~1)  ;>  o 


"• 1 . 


otherwise. 


.-i 


Then  x.  >  0  for  all  i,  and  it  is  easy  to  show  that  x.  <  2   ,  thus  after  N 

iterations  x^^  <  2    .   Then  we  have  y  ^e     =  y  e  ,  and  taking 

x   .  r  ,   *N+1   ,s  „  x   0-N-l 

yN+1  s  e  gives  an  error  of  yN+1(e     -  1)  -  yN+1  .  x^  <  e   •  2 

-N 
Since  0  <  x   <  log  2,  the  truncation  error  is  seen  to  be  bounded  by  2 

-N-l 
Roundoff  error  is  bounded  by  2    ,  assuming  a  sufficient  number  of  guard 

bits,  and  since  the  result  must  be  right  shifted  to  normalize,  the  total 

error  xs  bounded  by  2     +2     <  2 

DeLugish  [16]  initializes  by  taking 


where 


e  = 
o 


xl = 

x  - 

log  eQ 

yl  =  eo 

> 

' 

1/2 
e 

if 

* 
x  >  1/2 

< 

1/4 
e 

if 

1/4  <  x  <  1/2 

h 

1 

if 

x  <  1/4 

Then  a.  =  1  +  s.2     is  chosen  by 
li 


Si  = 


-1      if 
1      if 


x.  <  -3/8-2 

l 


-i 


x  >  3/8-2 


-l 


0 
,-i+l 


,-N 


otherwise 
It  is  shown  that  |x  |  <  3/8-2  "L"ra',  hence  |x^-|  ^  3/8-2  ™,  so  truncation 
error  is  slightly  smaller  than  in  the  previous  case.   The  number  of  non-zero 


DeLugish  actually  allows  |x|  <  log  2,  and  thus  two  more  possible  initilization 
values  are  given  by  him. 
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s.  is  about  N/3.   One  can  again  show  that  (for  moderate  i)  that  no  more  than 

two  successive  s.  can  be  non-zero,  thus  the  maximum  number  of  non-zero  s 
i  i 

is  about  2N/3,  decreasing  the  number  of  guard  bits  required.   Total  error 
is  bounded  by  3/4  •  2~N_1  +  2~N~2  =  5/4  ■  2~N~1  <  2~N. 

Chen  [5]  lets  p.  be  one  plus  the  number  of  leading  zeros  in  x . ,  i.e., 

x.  =  2"Pi  +  v.,  0  <  v.  <  2~Pi.   Then  log(a.)  =  log(l  +  2~Pi)  =  2~Pi  + 

0(2   i),  and  the  leading  one  bit  in  x.  is  eliminated.   At  most  N  +  1 

-N-l 
iterations  will  be  required  for  x.  <  2    .   The  termination  algorithm,  to 

be  applied  when  p   >  N/2  is  e  z   y  +  x  y  .   Again,  as  with  division, 
n  n    n  n 

truncation  error  can  be  halved  by  taking  e;y+y(x+2    ).   The 

n    n  n 

-N 
total  error  bound  is  again  less  than  2 

4.5  Power  Function 

y 

The  power  function,  x  ,  has  not  received  much  attention  in  the  literature, 

y   y  Ior  x 
This  function  is  usually  evaluated  by  the  identity  x  =  e       ,  which 

requires  evaluating  both  a  logarithm  and  an  exponential,  with  a  multiplication 

necessary  also. 

A  direct  method  was  proposed  by  Krishnamurthy  [28].   He  assumed  the 

numbers  were  in  radix  r,  but  we  specialize  this  to  r  =  2.   We  first  write 

y  =  I  +  f ,  where  I  is  an  integer  and  0  ^  f  <  1.   x  can  be  evaluated  by 

n 

v    —  i 
multiplications  and   a  division,  possibly.   Let  us  write  f  =  I   q . 2   .   Then 

i=l  X 

xf  =  x  I   ,  qi2_1  =  n   (x2_1)qi 
1=1       i=i 

1/2  1/2 

If  we  let  z  =  x    ,ztl1=z.    ,  for  i  =  1,2,...,N  -  1,  we  have 
1  l+l    l 

f    N     q  f 

x  =  II   z   i  .   We  can  determine  x  with  the  following  loop. 

i=l   i 
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Let  p   =  1,  z   =  x,  then 
o       o 


for  i  =  1,2,. ...N 

1/2 

Zi  =  Zl-1 


Pi  =  Pi-1  '  Ziq± 


'1-1 


'i-l"i 


if 


If 


q±-l 


Then  x 


'N 


This  method  would  seem  to  be  a  rather  time  consuming  procedure,  since 
N  square  roots  and  up  to  N  multiplications  are  necessary  for  only  the 
fractional  part  of  the  exponent.   It  seems  unlikely  that  it  would  compare 
favorably  with  the  conventional  method,  unless  a  square  root  operation 
were  available,  and  a  logarithm/ exponential  routine  was  not. 

Roundoff  error  would  be  able  to  come  in  both  during  the  square  rooting 
operation  and  during  formation  of  the  product,  and  would  thus  require  more 
guard  bits  than  usual. 

4.6  Logarithm 

Since  the  logarithm  is  the  inverse  of  the  exponential,  it  seems  reason- 
able that  the  inverse  of  methods  used  for  the  exponential  could  be  used  for 
the  algorithm.   This  is  essentially  true.   Most  routines  are  for  any  base 
logarithm,  depending  on  the  stored  constants,  but  one  due  to  Philo  [48]  and 
Dean  [15]  is  for  base  2  logarithms. 

We  will  consider  the  logarithm  of  X,  where  X  is  any  positive  number. 
We  then  write  X  =  2°  •  x,  where  1/2  <  x  <  1,  and  then  log  X  =  a  log  2  +  log  x, 
It  is  then  necessary  to  discuss  the  computation  of  log  x. 
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4.6.1  CORDIC  Algorithm 

-1  w-1 
Here  we  will  make  use  of  the  identity  log  w  =  2  tanh  "  — —  .   If 

w+1 

1/2  <  w  <  1,  then  -1/3  <  7777-  <  0,  a  region  for  which  the  CORDIC  algorithm 

converges.   We  enter  the  algorithm  with  m  =  -  1  in  the  vectoring  mode,  with 

z  =  0,  x  =  w+l  and  y  =  w  -  1.   The  value  of  z„   , ..  must  then  be  left 

N-+1 

shifted  to  obtain  the  result. 

Truncation  error  is  found  by  consideration  of  equation  (6) ,  by  a 

procedure  similar  to  that  for  the  inverse  tangent,  and  we  obtain  the  bound 

6    .   Roundoff  error  is  of  similar  size,  however,  these  bounds  are  for  the 

-1 
hyperbolic  tangent,  which  must  be  left  shifted  to  obtain  the  logarithm 

j  j      o-N+1 
thus  the  error  bound  is  2 

4.6.2  Pseudo-Division/Mulitplication  Methods 

This  is  a  pseudo-division  process,  and  can  be  carried  out  in  one 

pass,  as  is  seen  from  Table  3.   Normally,  log  z  is  required,  and  this  can 

1-z 
be  obtained  by  setting  y  =   1  -   z,    x  =   z.      This   gives   log(l  +  y/x)   =   log(l  +  ~~T~) 

-   log  z.      This  particular  transformation  keeps  x  and  y  small  and  non- 
negative,  as  is  desirable  and  necessary. 

N+l 
Using  no  guard  bits,  we  find  that  the  error  in  z   ..  is  -2   q  e      - 

N  1   1     -N 

2  q-e,  -  ...  -  2q  eT  .   Since  e     =0,  and   e.   <  2   ,  we  have  the  error 

1  1  N  N  o  '  1 ' 

N    N-l  -N  N+l 

bounded  by  (2  +2    +  ..  +  2)2   =  2.   The  pseudo-remainder  is  z   ../2    , 

-N 
hence  the  error  is  bounded  by  2   .   Now  the  accumulation  of  the  psuedo- 

dividend  requires  the  use  of  guard  bits.   If  it  is  done  in  the  double  length 

register,  as  suggested  by  Meggitt,  roundoff  error  will  be  insignificant,  and 

-N 
the  total  error  will  be  less  than  2 
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4.6.3  Normalization  Methods 

Specker  [61],  Perle  [46],  DeLugish  [16],  and  Chen  [5]  have  suggested 
these  algorithms,  and  the  selection  of  a.  is  the  same  as  for  division,  since 

in  each  case  it  is  desired  to  force  x  to  1.   The  error  bounds  are  about 

i       i     -N-l  i       i     -N-l 

the  same.   If   1  -  x   <  2    ,  then  the  error  is  log  1  -  x  \    z    2  ,  as 

n'  '     n' 

-N-l 
before.   Roundoff  error  again  can  accumulate  up  to  2     for  a  total  error 

of  2 

Chen's  termination  algorithm  is  y  +  log  x  ~  y  -  (1  -  x  ) ,  or  to 

n        n 

_N-2 
reduce  the  truncation  error,  y  +  log  x=y  -  (1  -  x  +2    ).   Again  n 

n         n 

is  taken  so  that  p  >  N/2.   Comments  pertaining  to  errors,  made  in  Section 
n 

4.1.3,  apply  here  also. 

4.6.4  Other  Methods 

Philo  and  Dean  suggested  the  following  method  for  computing  log  x. 
It  is  the  inverse  of  the  procedure  described  in  Section  4.5  for  computing 
x^.   It  was  discussed  by  Dean  in  terms  of  implementation  using  cellular 

arrays. 

N     ^ 

I  is1 

i=l 
We  want  to  compute  digits  q.  such  that  x  =  2         .We  assume 

that  1  <  x  <  2  so  that  0  <  log2x  <  1.   Then  x  =  21/2*ql  ■  21/4'q2  ...   • 

1  /  9^  n  9  2 

•2     'qn  .   If  x  >  2,  then  it  is  apparent  that  q  =  1.   If  x  <  2,  then 

q  =  0.   The  following  recursion  can  be  seen  to  generate  the  q  . 

x  =  x,   for  i  =  1,2,...,  N, 
o 

1    if    x1_12  >  2 


qi  = 


if    x.  ,2   <   2 
l-l 
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and 


x.  = 

1 


1/2- 


x 


i-1 


Xi-1 


if    q.  =  1 
l 


if    q.  =  0 
l 


-N 
The  truncation  error  of  the  above  procedure  is  2   if  enough  guard  bits 

are  carried  in  the  calculation  of  the  x  .   The  error  will  generally  more 

than  double  at  each  iteration,  although  a  possible  right  shift  will  tend 

to  keep  it  about  the  same  magnitude.   For  example,  if  the  error  at  the 

.  ,th       .        ., 
l-l   stage  is  £._i>  then 


€  .  = 

1 


1/2.<2X1_1£±_1+£1_1,)    if    q   -1 


2Xi-l£i-l  +  £i-l 


if    q.  =  0  . 
l 


1/2  1/2 

Considering  that  q,  =  1  or  0  as  x,  .  >  2    or  x.    <  2    ,  we  have 

l  l-l  l-l 


l/2-(2'2«e.  ,  +  e.  _2)  =  2e.  .  +  l/2e .   2  -  2e .    . 
l-l    l-l       l-l       l-l   ~   l-l 


€  .  < 
1 


o  o1/2     a.      2 


93/2  2    3/2^ 

l-l    l-l  -      i-l 


This  does  not  include  the  error  due  to  chopping  after  the  squaring  operation, 

-N-J 
which  would  be  bounded  by  2    ,  using  J  <  N  guard  bits.  e      is  zero,  so  we 


,3/2 


-N-J 


,3/2/0-N-Jv         0-N-J 


have    leJ    <   2J/Z-0  +  eD     =   2_iN_J  •      Then    |6    |  (2        ")  +  2 

1  Rx 


(2    +  1)2    .   Continuing  in  this  fashion,  or  by  induction,  one  obtains 


i„\    <-    [(iS/2)M+  (23/2)N-2+...+  l]2-N-J, 

N 


or 


e  |  <  (23/2)N-  2"N_J=  2N/2  "  J.   Thus  the  use  of  N/2  guard  bits 
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-N 
will  prevent  buildup  of  error  larger  than  2 

Because  repeated  squaring  is  required,  this  method  is  probably  not  too 

attractive,  even  though  it  is  a  simple  calculation. 

Nicoud  and  Dessoulavy  [45]  suggested  a  method  which  requires  repeated 

multiplication.   Suppose  log  x  is  desired,  where  1  <  x  <  2.   Now,  for  an 

-k  n 

appropriate  number  u  =  1  +  2   ,  we  consider  the  sequence  l,u,...,u  ,  until 

u  >  x.   Then  we  have  log  x  s  log  u  =  n  log  u.   The  truncation  error  is 

| log  u  -  log  x|  ^  log  u  -  log  u 

=  log  u  a  2~\ 

hence  we  would  probably  take  k  =  N  +  1.   The  error  introduced  by  forming 

the  sequence  l,u,...,u  is  bounded  by  n  2    .   Unfortunately,  for  x  z   2, 


we  would  require 


n  ,  1°JJ a  3.2N_1 

-N-l 
log (1+2  N   X) 


thus  the  number  of  operations  required  for  high  precision  is  excessive. 
About  N  +  1  guard  bits  would  be  required. 

4.7  Tangent 

Several  authors  have  discussed  generation  of  the  tangent  function, 
including  Meggitt  [40]  and  DeLugish  [16].   Of  course,  the  tangent  can 
always  be  generated  as  the  quotient  of  the  sine  and  cosine,  say  from  the 
CORDIC  algorithm.   Both  Meggitt  and  DeLugish  use  this  idea,  with  modifications. 

We  assume  that  the  angle  is  reduced  as  in  the  sine/cosine  routines. 
Thus  we  have  a  first  quadrant  angle.   This  is  further  reduced  by  the  identity 
tan  z  =  cot(7T/2  -  z)  if  z  >  tt/4. 

Meggitt' s  initial  modified  division  is  equivalent  to  finding  which 
rotations  need  to  be  done  to  drive  the  angle  to  zero,  and  doing  only  those 
which  do  not  "over-shoot"  zero.   The  arbitrary  value,  Q,  should  be  approxi- 
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mately  of  full  register  length,  near  one.   The  accumulation  in  the  double 
length  register  renders  roundoff  error  to  be  insignificant.   Error  in 
recoding  the  angle  is  about  the  same  as  was  discussed  in  the  case  of  the 
exponential  function,  and  one  guard  bit  and  N  +  1  iterations  (n  =  N  +  2) 
are  required  to  bring  the  truncation  error  bound  down  to  the  desired  level. 

DeLugish  use  the  same  initilization  as  for  sine/cosine,  and  the  same 
decision  process  as  is  used  in  the  latter  stages  of  that  procedure.   Of 
course,  the  radial  distortion  correction  is  not  necessary  for  any  of  the 
iterations. 

The  errors  in  calculation  of  the  sine  and  cosine  were  found  to  be 

-N 
bounded  by  2   ,  and  these  bounds  hold  for  the  modified  procedures  here.   Then 

„  .  e„ 
K  sin  z  -  e,  .  /vN  /n  2 
1      _      (tan   z  -   e.,/K)(l  + 

K  cos   z  -   e„ 


1  K  cos   z 


e„tan  z 
tan  z  -   £]/K  +  — ^7T 


£2|  ,.  zr^-N 


en      +  J ' —  <    (1  +   /2)2 

1  cos   z 

However,  separate  consideration  of  the  truncation  and  roundoff  errors  in 

K  sin  z  and  K  cos  z  will  yield  a  better  bound,  about  (1  +  l//2)2   ,  an  error 
confined  to  the  last  two  bits  of  the  result.   For  angles  greater  than  tt/4, 
the  reciprocal  is  taken,  and  the  error  becomes  arbitrarily  large.   This  is 
natural,  however,  since  the  value  of  the  tangent  also  becomes  arbitrarily 
large. 

4.8   Square  Root 

Computation  of  square  roots  has  received  a  great  deal  of  attention  , 
Lenaerts  [30]  was  one  of  the  first  to  discuss  the  possibility  of  a  build-in 
square  root  operation;  his  suggestion  was  for  a  particular  type  of  existing 
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computer,  and  was  an  adaption  of  the  usual  hand  method  for  extracting  square 
roots.   Cowgill  [8]  gave  a  non-restoring  version  of  the  usual  hand  method. 
Kostopoulos  [26]  and  others  have  also  suggested  adaptions  of  the  basic  hand 
calculation.   Metze  [42]  gave  an  algorithm  which  generates  the  square 
root  in  a  minimally  represented  redundant  form.   The  pseudo-multi- 
plication of  Meggitt  [40]  and  Sarkar  and  Kirshnamurthy  [55]  will  calculate 
square  roots,  as  will  the  CORDIC  algorithm.   Chen  [5]  and  DeLugish  [16]  give 
normalization  techniques  for  the  calculation.   DeLugish  gives  two  algorithms 
for  the  square  root.   One  is  an  additive  normalization  procedure  which  is 
actually  a  modified  hand  method.   We  will  discuss  iterative  techniques  briefly. 

We  assume  that  X  =  2   .  x,  where  1/4  <  x  <  1,  then  X    =  2  .  io7,  and 
1/2  <  toe  <  1.   We  concern  ourselves  with  the  computation  of  /x    . 
4.8.1  CORDIC  Algorithm 

Suppose  rw   ,   1/4  <  w  <  1  is  to  be  computed.   Then,  setting  x  =  w  +  1/4, 
y  =  w  -  1/4,  and  entering  the  algorithm  with  m  =  -1  in  the  vectoring  mode 

x  -y  =  K   •w  .   Unfortunately,  a  multiplication 

by  K     is  then  necessary  to  obtain  /w  ,  taking  an  additional  multiplication 

-2 
and  increasing  roundoff  error.   An  initial  formation  of  wK_1     could  be  done, 

-2  -1 

also,  but  would  require  that  K     be  stored,  as  well  as  K- 

The  truncation  error  for  the  calculation  of  K   w     is  a  lengthy  calcu- 

i  «-•      a    ■  •   -  i     K  i  (w2-w+3/16)  - 

lation,  and  is  approximately  -  _11 6N      where  o  =  ^-1  w  -  1/4 

(w+1/4)  cosh  a    -1  w  +  1/4 

-N-2 
Hence  the  truncation  error  is  bounded  by  K  •  2    .   Roundoff  errors  are  the 

same  as  for  other  applications  of  the  CORDIC  algorithm,  so  the  error  in 

K   •w  is  bounded  by  2  ~  +  K_1 2    .   The  division  (or  multiplication) 

would  introduce  additional  error. 
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4.8.2  "Hand"  Methods 

The  usual  hand  method  for  extracting  square  roots  is  a  matter  of 
trial  and  error.   Let  us  outline  and  review  the  procedure  for  radix  two. 

Suppose  we  have  a  number  x,  1/4  £  x  <  1,  and  we  have  computed  digits 

i-1 
q  ,q9,...,q    so  that  &•_-,  =  I        1.2   is  as  large  as  possible,  and  still 

j=l  J 

2 
satisfies   x-R    >  0.   Then  we  take  q.   to  be  zero  or  one,  depending 

on  whether  x  -  (R._-  +  2   )    is  negative  or  non-negative.   If  we 

2 
let   E.  =  x  -  R.  ,  then 
l        l 

E.  =  X  -  (R.  ,  +  q.21)2  =  E.  .  -  q.2"1+1(R.  ,  +  q.2"1"1). 
l         x-1    l        l-l    l       l-l     1 

Thus,  we  see  that  a  trial  value  of  E.   can  be  computed  by  subtracting 

2    (R.  -  +  2    )  from  E.  n.      If  the  result  is  negative,  q.  =  0  and 
l-l  l-l  l 

E.  =  E    .   If  the  result  is  non-negative,  q.  =  1,  and  E.  has  been  computed. 

For  machine  implementation  it  is  more  convenient  to  deal  with 

M.  =  2  E.,  rather  than  E..   The  calculations  can  then  be  arranged  as 
ill 

follows.   Since   1/4  <  x  <  1,  q.  =  1J  thus  we  can  initialize 

Mx  =  2(x  -  1/4) 

Rx  =  1/2 

For  i  -  2,3,. . . ,N 

T.  =  2M.  .  -  (2R.  .  +  2_1). 
l     l-l      l-l 

Then,  if  T.  >  0 
l 

q±=l 

M.  =  T. 
l    l 

R.  =  R.  .  +  2_1  ,  or 
l    l-l 

if 


T. 

l 

< 

0 

qi 

= 

0 

M. 

l 

= 

2M. 

l 

-1 

R. 

i 

= 

R. 
l- 

1 
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This  is  seen  to  be  a  restoring  type  of  process,  since  a  trial  value  is 

computed,  and  then  may  be  discarded. 

-N 
It  is  easily  seen  that  the  truncation  error  is  2   ,  since  we  have 

computed  the  first  N  bits  of  the  square  root,  and  the  remainder, 

oo 

v       -i    -N 

I  q.2   <  2   .   As  we  have  outlined  the  procedure,  no  roundoff  error 

i=N+l  1 

occurs  since  no  numbers  are  right  shifted  off  the  register,  and  all  other 
numbers  are  represented  exactly  with  N  fraction  bits.   We  should  note  that 
M.  can  be  large  enough  to  require  some  bits  to  the  left  of  the  binary  point, 
however.   This  might  be  best  handled  by  scaling  and  inclusion  of  an  appro- 
priate number  of  guard  bits. 

The  above  procedure  can  easily  be  converted  to  a  non-restoring  square 
root  routine.  The  version  we  give  is  not  generalized  the  same  way  as  the 
non-restoring  division  routine,  however. 

Let  M  =  2(x-l/4) 
Rx  =  1/2 

For  i  -  2,3,..  .,N 


qi  = 


/l   if  M.    >  0 

-1   if  M.  .  <  0 
l-l 


V 

then  R.  =  R.  .  +  q.2~ 
l    l-l    ni 

M.  =  2M.  ,  -  q.(2R.  .  +  q.2"1). 
l      l-l    Mi    l-l    ni 

The  error  bounds  are  the  same  as  for  the  restoring  square  root  procedure. 
Metze  has  given  a  non-restoring  algorithm  which  yields  the  value  in 
a  minimally  represented  redundant  form.   As  with  the  Metze  division  algorithm  [41] 
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there  is  overlap  in  the  regions  where  a  digit  may  be  chosen  as  zero  or 
non-zero.   We  outline  the  algorithm  as  given,  and  then  discuss  the  overlap 
and  a  possible  compromise.   Let 
R   =  x 

s-i  -  ° 

2 
Here  R.  will  correspond  to  the  partial  remainder  x-S .   ,  where  S.  is  the 

approximation  to  the  square  root  after  i  steps. 

For  i  =  0,1,..., N 

±0 


K.-U  =  2   (±5/3.S    +  25/36-2"1) 


K 


."  =  2  1(±4/3.si_1  +  4/9-2"1) 


Determine  q.  by 


qi  = 


{ 


Then 


0  if   K."°  <  R.  ,  <  K.+0 

1        1-1      X 

1  if  K.+1  <  R.  , 

l      l-l 

-1   if  K."1  >  R.  , 
l      l-l 


R.  =  R.  .  -  q.2  1(2S.  ,  +  q.2  1) 
l     l-l     l       l-l    ni 


S.  =  S.  .  +  q.2 
l    l-l    ^i 


-l 


Since  K.    <  K.  "  <  K.    <K.    ,  it  is  not  well  defined  in  the  above 
1111 

how  q.  is  to  be  chosen  if  R._-,  is  in  one  of  the  overlapping  regions.   It 

can  be  chosen  either  way  one  prefers;  the  representation  will  still  be 

±0       ±1 
minimal.   Since  the  above  values  of  K.    and  K    are  not  particularly  easy 

to  compute,  a  reasonable  suggestion  would  seem  to  be  to  compromise  at  some 

point  in  the  regions  which  is  easier  to  compute.   Such  a  value  is 


,-i 


,-i-l 


K.    =2   (±3/2 -S    +2    ).   Actually,  in  the  above  there  appears  to  be  a 


-51- 


slight  difficulty  with  the  iteration  i  =  0  since  then  K    ■  25/36,  and 

+1 
K    =  4/9.   However,  since  R   >   0,  the  negative  values  are  not  required. 

The  same  difficulty  appears  in  our  suggestion,  and  is  resolved  the  same  way. 

With  our  suggested  criterion  for  determining  the  q.  and  the  replacement 

of  R.  by  2  R.,  an  algorithm  for  computing  minimally  represented  square  roots 

is  as  follows.   We  remove  the  apparent  difficulty  on  the  first  iteration  by 

initializing  differently. 


Let 


S  = 
o 


1   if  x  >  1/2 


0  if  x  <  1/2 


and 

For  i  =  1,2,.. .,N 


M=x-S   =x-S 
o        o         o 


K.1  =  ±  3/2-S.  .  +  2  1  1% 
l  i-l 


then  determine  q.  by 


qi  = 


1   if  K.   <  2M.  , 
i      i-l 

-1   if  K.~  >  2M.  . 
l      i-l 

0  otherwise 


Then 


M.  =  2M 

l 


i-l 


-  «i(2si-i  +  «i2-1) 


S.  =  S.  .  +  q.2 
l    i-l    l 


-l 


The  same  comments  pertaining  to  roundoff  error  made  for  previous 
methods  applies  here.   Metze  shows  that  the  remainder  R^  is  bounded  by 


±0 


K^   ,  thus  we  have 


,-N. 


.-N 


-N 


,-N, 


2"]N(-5/3.SN_1  +  25/36-2"™)  <  x  -  S^  <  2  "(+5/3.SN^  +  25/36-2  ). 


-52- 


-N  2  -N 

Now,    SXT   .    ~    S.T,    and   2        <<   SXT,    so  x   -  SXT   is  bounded  by  about   2      'S/S-S^   < 
N-l  N  N  N  J  N 

-N+l 
2  S.T.      The   latter   can  be   shown   to  be   a  rigorous  bound   for  moderate 

N 

values   of  N.      We   are   interested   in   a  -  S^,   not  x   -   S„    ,   but   recalling   that 


S„  '     /x,   we  have 


N 


*£  -   S 


N 


X/SN 

7x~T~s 


N 


x  -  S, 


2"N+1S  -N 

<   Z  bN  =    2   N 


2S 


N 


2S 


N 


Thus  the  error  is  no  greater  than  one  in  the  last  bit. 

DeLugish  gives  a  similar  procedure  where  the  comparison  constant  is 

not  a  function  of  the  partial  result.   Let  U  =  x  /4,  then  the  initialization 

o    o 


step  is 


U.  =  1/2 (x  -  r  ") 
1       o    o 


R.  =  r 
1    o 

where  r   is  given  in  Table  8. 
o 

the  table. 


The  comparison  constant  c   is  also  given  in 


interval  for  x 
[1/2,  5/16) 
[5/16,  3/8) 
[3/8,  9/16) 
[9/16,  7/8) 
[7/8,  1) 


o 
1/2 

9/16 

5/8 

3/4 

1 


3/8 

7/16 

1/2 

5/8 

3/4 


Table  8 
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For  i  =  1,2,..., N-l 


s.  = 

1 


1 

-1 

if     U.    <  - 

i 

c 
4 
1 

1 

if     U±     > 

c 
4 

0 

otherwise 

•_9  -i-1 

and  U.^  =  2U.  -  1/2 -s . (R.  +  s .  2  1  Z)     R,+1  =  R,  +  s.2 
l+l     i        i  i    i      '      1+1    i    i 


-N 

so 


DeLugish  shows  that  |  oc  -  R .  |  ^2   ,  hence  |  Jx   -  R^  |  <  2   , 

-N 
truncation  error  is  bounded  by  2   .   Because  of  the  way  DeLugish  initializes, 

2  guard  bits  are  required  to  avoid  error  in  the  initial  calculation  of  U 

o 

and  in  the  latter  stages  of  the  U.  sequence.  This  is  the  same  problem 
referenced  earlier  when  discussing  the  magnitude  of  the  partial  remainders 
and  a  possible  scaling  with  guard  bits  to  contain  the  right  shift. 

4.8.3  Pseudo-Division/Multiplication  Methods 

Meggitt's  method  can  be  used  to  compute  square  roots,  as  Table  3  shows. 
The  only  complication  is  that  the  recursion  for  x .  -  has  three  terms,  but 
this  is  common  to  most  methods.   Roundoff  error  will  occur  in  the  calculation 

of  x   ,  but  as  before,  the  effect  is  less  than  1/2  in  the  Nth  digit,  or  less 

-N-l 
than  2    ,  with  no  guard  bits  required. 

4.8.4  Normalization  Methods 

Normalization  methods  are  for  the  calculation  of  y/»oc  ,  and  toe  is  obtained 

by  setting  y  =  x.   The  procedures  are  similar  to  those  for  division,  except 

2 
that  x.  is  multiplied  by  a.   ,  requiring  a  three  term  sum,  as  in  the  hand 

methods.   We  assume  1/4  <  x  <  1. 

DeLugish  initializes  by  x  =  x  ,  R  =  y,  then 
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U  =  rx  -  1 
1    o  o 


R,  =  R  /r    ,   where 


L  =  R  Sc 
1    o   o 


r  = 
o 


4  if  1/4  <  x  <  1/2 
o 


1   if  1/2  <  x   <  1 
o 


For  i  =  1,2,  .. .  ,N 

'   -1  if  U  >  3/8 
i      1   if  U.<  3/8 
0  otherwise    , 
and  then 

Ui+1  "  2Ui  +  Si  +  2-i(2V.s.+m-s.2)   +  UlS.2-2-2±-1 

h+l   "  Ri  (1  +  Si2"i"1) 

In  the  above,  we  note  that  a.  =  1  +  s.2      and  U .  , .,  =  2  (x.,.,-1), 

l        l  l+l       i+1 

where  x,,.  =  x.(l  +  s.2    )  . 
l+l    l      l 

DeLugish  gives  the  error  bound  Ix^,-,  -  1|  ^  3/4*2   ,  and  this  gives  a 

_N    -N-l 
truncation  error  bound  of  about  3/8  .2   <  2     .   Roundoff  error  bounds 

are  the  same  as  most  other  DeLugish  algorithms.   The  average  number  of  non-zero 

a.  is  about  N/3.   It  is  not  known  what  the  maximum  number  of  non-zero  s.  is, 

but  in  any  case  log_N  +  1  guard  bits  are  sufficient. 

The  Chen  algorithm  again  counts  left  zeros  in  1  -  x.  =  2   iJ+V.   , 

0  <  V.  <  2  pi  "  ,  and  p.  is  two  plus  the  number  of  left  zeros  in  1  -  x.. 

l  '      l        Y  i 

Then  a.  =  1  +  2_pi  ,  and  1  -  x.n  =  1  -  a.x.  =  1  -  (1  +  2~Pi)2(l  -  2~Pi   +  V.) 
i  '  i+1        11  i 

=  V.  +  0(2   Pi    ),  thus,  again  the  leading  one-bit  is  eliminated. 

The  termination  algorithm,  again  applied  when  p  >  t,  is  y/ni  ;:  y  + 
1/2 -y  (1  -  x  ).   For  purposes  of  reducing  truncation  error,  the  termination 
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-N-2 
algorithm  y//x  ~  y  +  1/2  yn(l  -  x^  +  2  )  may  be  used.   Truncation  error 

-N-2 
is  then  bounded  by  2    .   The  same  comments  about  roundoff  error  made  in 

other  discussions  of  Chen's  normalization  algorithms  apply  here. 

4.8.5   Iterative  Procedures 

2 
The  classical  iteration  for  the  solution  of  Z   -  x  =  0  is  the  Newton 

iteration,   Z.,n  =  1/2 -(Z.  -  x/Z.),  with  Z   given  as  an  initial  guess. 

1+1  11  o  & 

Convergence  is  quadratic.   Many  papers  have  been  written  concerning  optimal 
starting  values  and  other  aspects  of  the  iteration.  As  written  above,  a 
division  is  required. 

Ramamoorthy,  Goodman,  and  Kim  [49]  have  made  a  study  of  iterative 
methods  which  use  only  multiplication,  except  for  a  possible  final  division. 
The  iteration 

R±+1  -  £  (3  -  xR.2) 

-1/2 
converges  to  x      for  appropriate  R  . 

Another  iteration,  involving  two  variables  is 

R1+i  -  V2  -  BiV 
Bi+i  -  1/2(Bi  +  XIW   • 

-1/2  1/2 

Here  {R.}  converges  to  x     and  {B.}  to  x 

It  is  interesting  to  note  that  if  B  =  N  for  all  i  in  the  first  equation, 
that  this  is  the  iteration  for  the  reciprocal  of  N,  thus  if  this  type  of 
division  were  employed,  the  square  root  could  be  implemented  easily. 
4.9  Product 

Unlike  many  functions,  the  product  is  easily  implemented  in  a  straight- 
forward fashion.   A  modification  of  the  usual  successive  shift  and  add  method, 
additive  normalization,  is  suggested  by  DeLugish  [16].   Ling  [32]  has  given 
a  method  based  on  logical  circuits  to  form  the  product  of  8  bit  segments  of 
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numbers.   Chen  [4]  gives  another,  somewhat  similar  algorithm,  based  on 

AB  =  (— — )   -  (— t— )   .   Logan  [36]  has  a  similar  proposal  using  "squaring" 
chips.   And  of  course,  the  CORDIC  algorithm  will  yield  a  product. 

In  general  we  assume  that  we  wish  to  compute  yx,  where  1/2  <  x  <  1  and 
y  may  be  any  number. 
4.9.1   CORDIC  Algorithm 

Entering  the  CORDIC  algorithm  with  m  =  0,  y  =  0,  in  the  rotation  mode, 


yields  yN    ~  xz,  where  in  this  instance  |z|  < 
o 


1.   We  have  yXT  ,,  = 
^N  +1 
o 


N  N 

o  o 

:(-£   a.)  ,   and  |  z  f  £   a 


i=0 


i=0 


o        o 


-N-l 


-N-l II  ii 

Thus,  the  truncation  error  is  no  more  than  2  \x\  >    and  if  |x|  <  1,  the 

-N-l  -N-l 

bound  is  2     .   The  roundoff  error  is  also  bounded  by  2     for  a  total 

-N 
error  of  no  more  than  2   ,  or  one  in  the  last  bit. 

4.9.2  "Hand"  Methods 

The  usual  shift  and  add  method  was  outlined  in  Section  3.2.   There  is 

no  truncation  error,  and  N  guard  bits  are  required  for  no  roundoff  error 

(i.e.,  a  double  length  register),  but  log~N  +  1  guard  bits  renders  roundoff 

-N-l 
error  less  than  2     ,  with  chopping. 

DeLugish  uses  the  following  additive  normalization.   Let  U  =  x>  P  =  0, 


Initialize 


U,  =  2(U  -  m  ) 
1      o    o 


where 


m 
o 


Pn  =  P  +  ym 
1    o     o 


1/2   if   1/2  <  x  <  3/4 


if   3/4  <  x   <  1 
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For  i  =  1,2,. .. ,N 


1 

-1     if     U     <  -3/8 

s  .    = 

1 

< 

1     if     U     >   3/8 
i 

0     otherwise 

ul+l  = 

■20, 

-   s. 

l 

then 

l+X       1      1 

Pi+1  =  Pi  +  ^i2""1   ' 

-i-1  i     i~1 

Letting  m.  =  s.2  '   "  for  i  >  1 ,  we  have  U.  =2  (x  -  )   m.),  and 

3=0  J 
i-1 

j-0  J 

Thus  the  truncation  error  is 

N 

yx  "  pn+i  =  y(x  ~  I     mi>' 

j=0  J 


>-N 


N 
DeLugish  shows  that  | x  -  £   m  |  <  3/8*2  N, 

j-0  3 

i  i       -N    -N-l     i  i 
thus  the  truncation  error  is  bounded  by  | y | . 3/8 • 2   <  2     if  |y|  <  1. 

Roundoff  error  again  depends  on  the  number  of  non-zero  s, ,  which  average 

about  N/3  again.   By  an  argument  similar  to  that  for  division,  it  can  be 

shown  that  no  more  than  two  successive  s.  can  be  non-zero,  thus  at  most 

l 

about  2N/3  of  the  s.  can  be  non-zero.   Hence,  about  log?2N/3  +  1  guard  bits 

-N-l 
are  sufficient  to  bound  roundoff  error  by  2    .   Total  error  is  no  more  than 

one  bit. 

4.9.3  Other  Methods 

Several  methods  have  been  derived  for  obtaining  the  product  by  decomposing 

it  into  a  sum  of  several  other  terms  which  are  determined  by  means  of  logical 

circuits.   Ling  gave  a  method  whereby  the  product  of  two  8  bit  numbers  is 

obtained  in  three  additions.   For  longer  word  lengths,  the  idea  is  applied  to 
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segments  of  the  numbers  in  parallel,  and  then  the  product  found  by  summing 
Time  is  about  1  +  2k  addition  times,  where  the  word  length  is  8  .  2 
k  >  1. 

Basically,  the  idea  is  to  form 

2£  =  t  =  2n±   (   1/2  <  ±  <  i 


x- 


£  =  J  =  2nj   ,   1/2  <  j  <  1   . 


2 

2 

2 

Then,  with   S (z)  =  z  -  —  ,  s(i)  and  S ( j )  are  formed  by  means  of  logical 
circuits.   Then  it  is  easily  verified  that 

xy  =  2  {2nl  -  2mj  -  22nS(i)  +  22mS(j)}. 
As  many  as  26  terms  involving  seven  variables  occur  in  the  definition  of 
the  bits  of  S  (z  )  . 

Chen  uses  the  idea  that  xy  =  (~ ^  )   -  (~~ ^  )    »  and  a  decomposition 

of  the  square  of  a  number  into  the  sum  of  three  quantities  (for  8  bits; 

2 
two  quantities  for  4  or  6  bits),   Z  =  R  +  S  +  T. 

The  expressions  for  R,  S,  and  T  are  decided  upon  by  considering  the 

"squaring  parallelogram"  that  results  when  forming  the  square  with  pencil 

and  paper.   Symmetry  can  be  exploited,  and  the  decomposition  given  by  Chen 

results  in  each  bit  involving  no  more  than  14  terms  and  six  variables. 

Logan  [36]  has  a  similar  idea  using  squaring  chips  [35],  except  that 

2    2    2 
he  forms  2xy  =  (x+y)   -  x  -  y  .   This  would  appear  to  be  more  costly  in 

terms  of  both  time  and  hardware  than  using  xy  =  (  ^)      -    (  o  )  • 

The  above  procedures  form  a  double  length  product,  thus  there  is  no 

error  involved. 
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5.0  Conclusion 

Several  conclusions  can  be  drawn  from  this  study,  depending  on  the 
point  of  view  taken.   If  simplicity  and  generality  is  the  main  objective, 
then  the  CORDIC  algorithm  of  Walther  [66]  should  be  heavily  favored.   If 
speed  is  of  principal  importance,  then  the  normalization  method  proposed 
by  Chen  [5]  is  excellent  for  its  purpose.   We  note  that  Chen's  algorithm  is 
the  subject  of  a  U.  S.  patent.   For  implementation  with  more  conventional 
hardware,  and  a  good  compromise  between  simplicity  and  speed,  the  set  of 
normalization  algorithms  proposed  by  DeLugish  must  be  highly  rated. 

The  special  purpose  algorithms  usually  (although  not  always)  are 
faster  than  those  which  are  part  of  a  unified  algorithm.   The  minimal 
representation  algorithms  by  Metze  [41],  [42]  should  be  excellent  for  their 
purpose.   The  decomposition  schemes  for  performing  multiplication,  proposed 
by  Ling  [32],  Chen  [4],  and  Logan  [36],  appear  to  be  capable  of  high  speeds, 

As  was  noted  earlier,  the  only  method  available  for  computation  of 
trigonometric  functions  is  the  CORDIC  algorithm  and  variations  of  it.   It 
appears  that  this  is  one  area  where  some  additional  study  is  necessary  to 
develop  faster  and  more  efficient  algorithms. 
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Appendix  A:   Block  Diagrams 
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BLOCK   DI    GRAM   -   CC:?DIC  ALGORITHM 


X.Y.  Z.m.mode 


rr  L  .it  i  An 


Register 

Contents 

R 
S 

X. 

T 
A 
I 

z1 

Z.    or  Y. 

i    .        i 

1 

Yti^torinn 


(T) 


sign   sensor 
*     s.=sign  c(A) 


A*-  c(S) 


complementor 
ms  . c ( . J ) 


rtii 


complementor 

-s.c(R) 
l 


shifter 

ms,c(J)2"Pi 
i 


-P, 


shifter 

-s.c(R)2~Pi 


+%. 


adder 

R  *-  c(R)ims.c(3)2"Pi 


adder 
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BLOCK  DIAGRAM   -   PSEUDO-MULTIPLICATION 

x.y.pF=l"gJ.fctn 

product,  tan •^'^f^^P^^^exp, square 


II*-  1 
I  *-  0 


C  *-  x 
Z-y 
X^  x 


Register  Contents 

X      pseudo-multiplicand, x. 
pseudo-product , z . 
multiplicand  modifier 
x 

index, i 
index  modifier, ±1 


Z 
M 
C 
I 
II 


*0 


product 


=0 


M«-  0 


fctn 
exp 


square 


tan 


M<-  c(X) 


M  «-  -c(Z) 


shifter 
M  <^c(M)2 


-i 


-i 


M«-  -c(C) 


I 


shifter 
M  <r  c(M)2 

— r~ 


•  2i 


r2i 


shifter 
M  <r  c(M)2 


•  i+1 


hi+1 


adder 

Z  <£-c(Z)+c(X) 


X. 


1 


adder 

X*-  c(X)+c(M) 


asmflffi 


shifter 
Z  «-  c(Z)2 


c(II) 


c(II') 


adder 

I  «-  c(I)+c(II) 


^0  and  j=n  ^   c(j)  ^»  =0  or  =n 


M^  c(C) 


fctn 

otherwise 


shifter 

M*-  c(M)2 
r 


-i-1 


•i-1 


adder 

X  *•  c(X)+c(M) 


\        EXIT  J 
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BLOCK  DIAGRAM   -   PSEUD0-DIVI3I0N 


x.y. 

fctn 

J 

Q 

*•  0 

A 

^y 

B 

*■  X 

C 

*-  X 

I 

«*-  0 

quotient 


M^  0 


Register   Contents 

A  pseudo -dividend,  z. 

B  pseudo-divisor,  x. 

Q  pseudo-quotient,  Q. 

M  dirisor  modifier 

I  index,  i 

C  x 


square  root 


tan 


M  *-  -c(B) 


shifter 
M  *■  c(M)2" 


,-i 


M*-  c(A) 


M— c(C) 


shifter 
M*-  c(M)2 


-2i 


,-2i 


shifter 
M*-  c(M)2 


-i+1 


•i+1 


adder 

A*-  c(A)  -  c(B) 


shifter 
Q  ±  2c(,Q) 


*0  ^ 

:(A) 

N  < 

0 

\  { 

^               1 

1 

t 

f 

^^ 

1 

' 

adder 

adder 

adder 

B-  c(B)+c(M) 

Q*-c(Q)  +  1 

A*-c(A)+c(B) 

■\ 

r 

> 

1   " 

r 

1 

shifter 

A*-  2c(A) 

£  n 


square 
root 


M-*-  c(C) 


otherwise 


c(I) 


>  n 


shifter 
M«-c(M)2 


•  i-1 


(   EXIT  J 


ri-1 


adder 
B*-c(B)-c(M: 
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BLOCK  DIAGtfhfo   -  NORMALIZATION  METHODd 


log 

<Tfctn  ^> 
[exp 

ratio,    square   i 

. —      i 

L 

>  t 

R  «-  x 

J  <-  y 

S  «-x 
T-y 

R  *-  x 
T*-y 

l 

1 

j 

' 

select  s .  and  p. 
1     *i 


U  «-  c(R) 


complementor 
s.c(U) 


shifter 
c(U)2 


-m 


-m 


adder 

R  *-   c(R)+c(U)2 


-m 


adder 


►  -Pi 


Register 
R 
S 
T 

I 


S  «*-  c(S)+(-log(l+8.2  Kl)) 


fctn 


otherwise 


square 
root 


j£e$ 


no 


EXIT 


adder 

I  *•  c(I)+l 


m  = 


p. ,fctn#square  root 

p.-l,fctn=  square  root,  Is*  time 

p.+l,fctn=  square  root,  2nc*  time 


Contents 

Xi 

yi   or  xt 

yi 

l 


complementor 
siC(T) 


shifter 


c(T)2"Pi 


Pi 


c(T)+s.c(T)2  ^ 


±JL 


<N 
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BLOCK  DIAGRAM  -  DE  LUGISH  MULTIPLICATION 


X 

,y 

w 

U—  x 

P  *-0 
I  ^1 

Register   Contents 
U         U, 


P 
I 

T 


test  value 


q«e--l  * 


^  3/4 


c(U) 


5  3/4, 


q^-0 


shifter 
-2q 


shifter 

tL 


adder 

U«~  c(U)-2C 


adder 
P«-c(P)+y2 


shifter 
U*-  2c(U) 


s.^-  1 

1 

u>0^ 

c(U) 

s^^Q, 

S,    <r    -1 

,  1 

• 

> 

» 

adder 

T  «-c(T)-3/8 

adder 

T  *■  3/8-c(T) 

shifter 

U  *■  2c(U] 


shifter 
Y2-1-1 


-i-1 


complementer 


3 


adder 

U^  c(U)  -  a, 


t 


adder 

P*-  c(P)+ys.2 


-i-1 
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BLOCK  DIAGRAM  -  METZE  DIVISION 


it 


Register   Contents 


R  *-  y 
Q«-  0 
I<-  0 


R 

Q 
t. 

I 
T 


<  39/64 


15/16 


K-:- 13/32 


K  *-  1/2 


**- 


K<-  5/8 


K^  3/4 


c(R) 


>,0 


adder 

T*-  c(K)-c(R) 


I 


adder 
T«-c(R)-c(K) 


q.  *•  -1 
1 


qi*-  1 


}  0 


complementor 


qi 


c(T) 


<   0 


complementor 


"qiX 


*i 


adder 

Q«-  c(Q)+q.2 


-i 


adder 

R*-  c(R)-qiX 


shifter 
R*-  2c(R) 


2R. 


(3Ri-J^sign(R.) 


adder 

I«-  c(I)  +  1 


i  N 


c(I 


>N 


EXIT 
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BLOCK  DIAGRAM  -  METZE  SQUARE  ROOT 


X 


M  *-  x 
I-  1 


Register 

Contents 

M 
S 
I 
K 
A 

partial  remainder 

approx.  square  root 

i 

K 

test  value 

£_i 


adder 
M*-  x+(-l) 


S+-  0 


shifter 
K^-ic(S) 


rl 


shifter 
M*-  2c (M) 


adder 
K<-c(K)+c(3) 


adder 
A<-c(M)-2 


-i-1 


c(K) 


q^-1 


£ 


q.-*-l 
i 


EXIT 


complementer 
A*--c(A) 


adder 

K  *■  c(K)+c(A) 


complement or 
-qjC(3) 


-q 


complementer 


a. 


shifter 
-2q.c(5) 


adder 
M«-  c(MH-2q<c(3)) 


adder 

S«-  c(S)4q.2 


-i 


adder 

M«-  c(M)  -  2 


-i 


adder 

I*-  c(I)  +  1 


^^c(I)  ^^L 


<^T) 
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Appendix  B:   CORDIC  Simulation  Program 
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C      PRCGRAM  C  0  R  0  I  C 

C 

C      THIS  FFGGPAN  SIMLLATES  THE  GENERALIZED  CORCIC  ALGORITHM  OF  WALTHER 

C 

C      PCM  VALUES  ARE  CCi»PUTED  WITH  ROUNDING  TC  NPB  +  NPG  BITS  ACCURACY 

C 

C      IN  THE  ITERATIONS,  CHOPPING  IS  USED  AFTER  MULT  IPLI CAT ICNS.   THIS 

C      CCPPESFCNCS  TC  A  PIGFT  SHIFT  OFF  THE  END 

C 

C 

C      INPUTS  ARE  AS  FCLLCWS 

C 

CM  IS  P 

C      MODE  IS  1  FCR  ROTATION,  2  FOR  VECTORING 

C      NPE  IS  THE  NUMBER  CF  PRECISION  BITS  OF  THE  COMPUTER 

C      NGc  IS  ThE  NUMBER  CF  GUARD  EITS  WITH  WHICH  THE 

C  CALCULATIONS  ARE  PERFORMED   (NOTE  THAT  WITH  32  BIT 

C  ARITHMETIC  THE  VALUE  CF  NPB+NGB  MUST  BE  NO  MORE  THAN 

C  31) 

C      IC  OUTPUT  IS  GIVEN  ON  THE  ITERATION  WHEN  I/IO  IS  ^^ 

C  INTEGER 

C      X  THE  1NPLT  VALUE  OF  X  (SHCULD  BE  LESS  ThAN  ONE) 

C      Y  THE  INPUT  VALUE  OF  Y  (SHOULD  BE  LESS  THAN  ONE) 

C      Z  THE  INPUT  VALUE  OF  Z   IS  IN  DEGREES  IF  M  =  1,  ANC 

C  SHCLLD  BE   180  GR  LESS  IN  MAGNITUDE.    IF  M  IS  -1 

C  CR  0    1    SHOULD  BE  LESS  THAN  ONE  IN  MAGNITUDE) 

C 

CIMENSICN  RCM(2,5C)  ,  NX  1  (  32  )  ,JMYI  (  32)  ,NZI(32) 

INTEGER  RCM,SFP,SFG,TSF,ZI ,X I , Y I , XT EMP ,0EL I , M, MODE , NP6,D 1 , SZ  ,K 
1  , REPEAT, XCFtYCR  ,ZCRtHSFG 

PEAL* 8  D,CATANH,RK,X,Y,Z,PI,ANG,XC,YC,ZC 

CATANH(X)  =  .5DO*CLCG((l.D0  +  X)/(1.CC  -  X)) 

PI  =  DATAN( l.DO)*«.CC 
100  PEAC  1,M,M0CE,NPE,NGE,I0 

IF (NPB.LE.O)STOF 

REPEAT  =  k 

SFF  =  2**NPE 

SFG  =  2**NGE 

HSFG  =  SFG/2 

TSF  =  SFP  *  SFG 
C 

C       INITIALIZE  ROM,CCMPLTE  SCALE  FACTOR 
C 

RK  =  l.DO 

CI  =  1 

IF(M.LE.0)C1  =  2 

I  =  C 

J  =  C 
ku  I  =  I  +  1 

J  =  J  +  1 

C  =  l.CO/Cl 

PCM2,I)  =  TSF/C1 

IF(M)14C,15C,16C 
140  RCM(1,I)  =  CATANF(D)*TSF     +.5D0 

GC  TC  160 
15C  RCM(  1,1)  =  TSF/D1 

GC  TC  180 
16C  R0M(1,I)  =  CATAN(L)*TSF  +  .5C0 
18C  RK  =  RK*(1.CC  +  M*D**2) 
190  IF <M)195,2CC,20C 
195  IF( J.NE.REFEAT)GC  TC  2C0 

1=1  +  1 

REPEAT  =  3*REPEAT  *  1 

RCM1,I)  =  FCM(1,I-1) 

R0M(2,I)  =  R0M(2,  I- 1 ) 

FK  =  PK*(1.C0  ♦  P*D**2) 
2CC  IF(D1.GT.SFP)G0  TC  2C1 

CI  =  2*C1 

GC  TC  120 
201  NMAX  =  I 

IF  (NGE  .LE.OJNMAX  =  NMAX  -  1 

K  =  CSCRT(RK)*TSF  +  .5C0 
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RE/C  2,X,Y,2 

PRINT  5,M,MCLE,NPB ,NGB,X,Y,Z 

SCALE  INPUTS  TC  INTEGER  (NPE4NGB  BITS)  AND  DO  FIRST  90  DEGREE 
RC7ATICN 


IF(i*.GT.O)GC 

11 

XI 

VI 

GO 


ANG  = 
GC  TC 
SZ 
GC 
SZ 

IF(SZ 
ZI  = 
XI  = 
YI  = 
ZI  = 
XI  = 
YI  = 


TC    2C5 
=    Z*SFP    4    DSIGM  .5CC,Z) 
=    X*SFP/K*TSF     *    CSIGN( 
=    Y*SFF/K*TSF 
TC    250 
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